frictionlessdata / datapackage-py

A Python library for working with Data Packages.
https://frictionlessdata.io
MIT License
191 stars 43 forks source link

xlsx recognised as zip #187

Closed mcarans closed 6 years ago

mcarans commented 7 years ago
from datapackage import Package

package = Package({'title': 'Somalia Consolidated Cash 3W', 'id': '55ec2570-4870-49d6-989b-56e11cf8da1a', 'description': 'Consolidated 3W', 'name': 'consolidated-cash-3w'})
package.add_resource({'path': 'http://data.humdata.org/dataset/55ec2570-4870-49d6-989b-56e11cf8da1a/resource/4a65b0fa-9c7f-4ee8-a35c-7435fb467f10/download/cash-march-final.xlsx',
                      'format': 'xlsx', 'title': 'Dataset with regional overview', 'encoding': 'utf-8', 'name': 'cash_march_final.xlsx'})
package.infer()

Not sure why the above fails with a zip error. tabulator.exceptions.FormatError: Format has been detected as ZIP (not supported)

roll commented 7 years ago

Yes. It's interesting. xlsx is zipped format but tabulator is able to handle it. I'll fix it.

roll commented 6 years ago

It's fixed from datapackage@1.1.3.

We was trying to read xlsx as a csv (so detecting zip in this case is exptected behavior for tabulator) - https://github.com/frictionlessdata/datapackage-py/commit/c11165ffdb20e6d3d15dcf6fa33c7e29e78e04d1


{'description': 'Consolidated 3W',
 'id': '55ec2570-4870-49d6-989b-56e11cf8da1a',
 'name': 'consolidated-cash-3w',
 'profile': 'tabular-data-package',
 'resources': [{'encoding': 'utf-8',
                'format': 'xlsx',
                'mediatype': 'text/xlsx',
                'name': 'cash_march_final.xlsx',
                'path': 'http://data.humdata.org/dataset/55ec2570-4870-49d6-989b-56e11cf8da1a/resource/4a65b0fa-9c7f-4ee8-a35c-7435fb467f10/download/cash-march-final.xlsx',
                'profile': 'tabular-data-resource',
                'schema': {'fields': [{'format': 'default',
                                       'name': None,
                                       'type': 'any'},
                                      {'format': 'default',
                                       'name': None,
                                       'type': 'string'},
                                      {'format': 'default',
                                       'name': 'Individual beneficiaries '
                                               'reached in March',
                                       'type': 'integer'},
                                      {'format': 'default',
                                       'name': None,
                                       'type': 'any'},
                                      {'format': 'default',
                                       'name': None,
                                       'type': 'any'},
                                      {'format': 'default',
                                       'name': None,
                                       'type': 'any'},
                                      {'format': 'default',
                                       'name': None,
                                       'type': 'any'},
                                      {'format': 'default',
                                       'name': None,
                                       'type': 'any'}],
                           'missingValues': ['']},
                'title': 'Dataset with regional overview'}],
 'title': 'Somalia Consolidated Cash 3W'}
mcarans commented 6 years ago

Yes fixed