datopian / datahub-qa

:package: Bugs, issues and suggestions for datahub.io
https://datahub.io/
32 stars 6 forks source link

Processing data package fails with descriptor validation errors #249

Closed PiDelport closed 6 months ago

PiDelport commented 5 years ago

Context: I'm in the process of adapting ~150 civic data sets to datahub.io: the following data package is a test upload.

How to reproduce

The following data package validates with data validate and uploads successfully with data push:

However, processing it fails:

ERROR: Data Package validation error: Descriptor validation error: {'name': '__placeholder__', 'path': '_', 'profile': 'data-resource'} is not valid under any of the given schemas at "resources/0" in descriptor and at "properties/resources/items/oneOf" in profile

ERROR: Data Package validation error: Descriptor validation error: 'data-resource' is not one of ['tabular-data-resource'] at "resources/0/profile" in descriptor and at "properties/resources/items/properties/profile/enum" in profile

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/add_resource.py", line 6, in <module>
    parameters, datapackage, res_iter = ingest()
  File "/usr/lib/python3.6/site-packages/datapackage_pipelines/wrapper/wrapper.py", line 40, in ingest
    datapackage, resource_iterator, dependency_dp = process_input(sys.stdin, validate, debug)
  File "/usr/lib/python3.6/site-packages/datapackage_pipelines/wrapper/input_processor.py", line 88, in process_input
    datapackage.validate(dp_to_validate)
  File "/usr/lib/python3.6/site-packages/datapackage/validate.py", line 16, in validate
    Package(descriptor, strict=True)
  File "/usr/lib/python3.6/site-packages/datapackage/package.py", line 105, in __init__
    self.__build()
  File "/usr/lib/python3.6/site-packages/datapackage/package.py", line 300, in __build
    raise exception
  File "/usr/lib/python3.6/site-packages/datapackage/package.py", line 295, in __build
    self.__profile.validate(self.__current_descriptor)
  File "/usr/lib/python3.6/site-packages/datapackage/profile.py", line 67, in validate
    raise exceptions.ValidationError(message, errors=errors)
tableschema.exceptions.ValidationError: There are 2 validation errors (see exception.errors)

The datapackage.json was inferred from the CSV with Package.infer, and then enriched with additional metadata fields:

{
    "name": "22t8-gvzh",
    "profile": "tabular-data-package",
    "resources": [
        {
            "path": "data/22t8-gvzh.csv",
            "profile": "tabular-data-resource",
            "name": "22t8-gvzh",
            "format": "csv",
            "mediatype": "text/csv",
            "encoding": "utf-8",
            "schema": {
                "fields": [
                    {
                        "name": "geometry_identifier",
                        "type": "integer",
                        "format": "default"
                    },
                    {
                        "name": "geometry_area",
                        "type": "number",
                        "format": "default"
                    },
                    {
                        "name": "sg_number",
                        "type": "string",
                        "format": "default"
                    }
                ],
                "missingValues": [
                    ""
                ]
            }
        }
    ],
    "title": "Cadastral Data - Rustenburg Mineral Rights",
    "description": "A small extract of cadastral data related to mineral rights in the Rustenburg area.",
    "sources": [
        {
            "title": "Department of Rural Development and Land Reform",
            "path": "https://csg.esri-southafrica.com/portal/apps/webappviewer/index.html?id=34ec3dcf8d8642bb9ed7f795cbfe8faf"
        }
    ],
    "contributors": [
        {
            "title": "Open Data South Africa"
        }
    ],
    "keywords": [
        "minerals",
        "land",
        "property"
    ],
    "created": "2018-08-01T07:00:00+00:00",
    "x_license_name": "See Terms of Use",
    "x_category": "Government"
}

Expected behavior

Processing should succeed?