datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

[load][xs]: allow setting encoding #40

Closed zelima closed 5 years ago

zelima commented 5 years ago

I have a source that datapackage-py cannot detect encoding for. And getting errors when trying run flows. PR allows setting encoding for resources if that is present in the descriptor

Flow(
    load(
        load_source='https://www2.census.gov/programs-surveys/cps/tables/time-series/historical-income-households/h01ar.xls',
        format='xls',
        sheet= 1,
        skip_rows=list(range(1,62)) + [-1],
        encoding='utf-8',
        headers=['Year', 'Number (thousands)', 'Lowest', 'Second', 'Third', 'Fourth', 'Top 5 percent'],
    ),
    printer()
).process()

Error message

  File "/home/zelima/.virtualenvs/data-factory/lib/python3.6/site-packages/dataflows/processors/load.py", line 55, in process_datapackage
    self.res.infer(confidence=1, limit=1000)
  File "/home/zelima/.virtualenvs/data-factory/lib/python3.6/site-packages/datapackage/resource.py", line 256, in infer
    encoding = cchardet.detect(contents)['encoding'].lower()
AttributeError: 'NoneType' object has no attribute 'lower'
coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 191


Totals Coverage Status
Change from base Build 187: 0.02%
Covered Lines: 1118
Relevant Lines: 1438

💛 - Coveralls
akariv commented 5 years ago

@zelima This bug was fixed in datapackage-py and deployed this morning. lmk if it remains once you update datapackage and tableschema libs.

zelima commented 5 years ago

@akariv this issue described above is fixed for me right now datapackage-py>=1.5.1. I assume we can close this one