catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

compression of data package resources causing validation errors #352

Closed cmgosnell closed 4 years ago

cmgosnell commented 4 years ago

Hey @roll! We were thinking about compressing one of our resources and I got it all working except when running goodtables.validate I'm getting one error. We used the I tried removing the compression and

We used the patterns for compression - the path in the metadata looks like this: "path": [ "data/hourly_emissions_epacems_2016.csv.gz", "data/hourly_emissions_epacems_2017.csv.gz" ],

Here is the error I'm getting.. Is this an expected behavior? Or are compressed files supposed to be "valid" resources and I'm just doing something wrong. Let me know!


   'time': 0.15,
   'valid': False,
   'error-count': 1,
   'row-count': 0,
   'source': ['/Users/christinagosnell/code/pudl/results/datapackage/epacems_eia860/data/hourly_emissions_epacems_2016.csv.gz',  '/Users/christinagosnell/code/pudl/results/datapackage/epacems_eia860/data/hourly_emissions_epacems_2017.csv.gz'],
   'schema': 'table-schema',
   'errors': [{'code': 'source-error',
     'message': "'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte",
     'message-data': {}}]}],
roll commented 4 years ago

Hi @cmgosnell,

I've reviewed and tested the whole FD stack and figured out that support for multi-parting and compression for the same resource is blocked by https://github.com/frictionlessdata/tabulator-py/issues/267, for now.

We can consider implementing it later but as, for now, our primary plan is to use a resource per partition I think this issue can be closed.

cmgosnell commented 4 years ago

everything is working for this yay