frictionlessdata / datapackage-py

A Python library for working with Data Packages.
https://frictionlessdata.io
MIT License
191 stars 43 forks source link

Problem setting resource schema property to use local schema file path #163

Closed g8tor closed 7 years ago

g8tor commented 7 years ago

I am new to using frictionless data's specifications so I could be totally off base on this, that said I apologize in advance.

I have multiple files in a directory that share the same schema (but can not be combined). I would like to use a local file path for the schema property as described (http://specs.frictionlessdata.io/data-resource/)

Esentially, I want to

I am able to do all of the above, hoever when I try to commit the change on the resource by calling resource.commit() I am getting the following:

Package Profile: tabular-data-package
Resource Name: tabular-data-resource

Traceback (most recent call last):
  File "demo.py", line 42, in <module>
    resource.commit()
  File "/var/lib/libvirt/images/pyenvs/lib/python3.5/site-packages/datapackage/resource.py", line 265, in commit
    self.__build()
  File "/var/lib/libvirt/images/pyenvs/lib/python3.5/site-packages/datapackage/resource.py", line 286, in __build

    self.__current_descriptor)
  File "/var/lib/libvirt/images/pyenvs/lib/python3.5/site-packages/datapackage/helpers.py", line 165, in expand_resource
_descriptor
    for field in schema.get('fields', []):
AttributeError: 'str' object has no attribute 'get'

Checking the expand_resource_descriptor helpers.py file, it looks like its expecting a dict. which contradicts the spec. Am I correct or am I going about this the wrong way? Any and all help is greatly appreciated.

Thanks In Advance.

Please advise.

Thanks in advance for any and all help.

roll commented 7 years ago

@g8tor Thanks for the reporting the issue. I'll investigate (probably a bug).

g8tor commented 7 years ago

@roll

No problem at all. .Thanks for your work

roll commented 7 years ago

@g8tor I would recommend an approach like this:

from datapackage import Package, Resource, infer

SCHEMA_PATH = 'schema.json'

# Infer a schema
resource = Resource('data1.tsv')
resource.infer()
resource.schema.save(SCHEMA_PATH)

# Prepare a descriptor
descriptor = infer('*.tsv')
for resource_descriptor in descriptor['resources']:
  resource_descriptor['schema'] = SCHEMA_PATH

# Create a data package
package = Package(descriptor)

This infer/commit system is relatively new so there is a few shortcomings for now. Like in your code resource wasn't able to dereference the schema (path -> dict). Also resource.commit for now will not lead to package.descriptor change. We're looking forward to fix all this small issue.

g8tor commented 7 years ago

@roll ill give it a try and thanks for the suggestion.

roll commented 7 years ago

@g8tor You're welcome! Please re-open if needed.