datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

keyerror when using join #120

Open sephib opened 4 years ago

sephib commented 4 years ago

Hi,
Working on the Rasham Hekdesh I've encountered a KeyError: 'שם הקדש' error.
When trying to join the two resources: hekdeshGeneral.csv and hekdeshPropery.csv

from dataflows import Flow, load, dump_to_path, printer, join
Flow(
    load(hekdeshGeneral, name='hekdesh_general'),
    load(hekdeshProperty, name='hekdesh_property'),
    dump_to_path('hekdesh_general_property'),
     join(
         'hekdesh_general', ['מספר תיק'],  # Source resource
         'hekdesh_property', ['מספר תיק'], # Target resource
         mode='full-outer'   `
     ),
    printer(num_rows=1, tablefmt='html')
).process()[1]

Bellow is the full error stack

--------------------------------------------------------------------------- KeyError Traceback (most recent call last) in 48 mode='full-outer' # Don't add new fields, remove unmatched rows 49 ), ---> 50 printer(num_rows=1, tablefmt='html') 51 ).process()[1] ~/anaconda3/envs/dataflows/lib/python3.7/site-packages/dataflows/base/flow.py in process(self) 13 14 def process(self): ---> 15 return self._chain().process() 16 17 def datastream(self, ds=None): ~/anaconda3/envs/dataflows/lib/python3.7/site-packages/dataflows/base/datastream_processor.py in process(self) 84 try: 85 for res in ds.res_iter: ---> 86 collections.deque(res, maxlen=0) 87 except CastError as e: 88 for err in e.errors: ~/anaconda3/envs/dataflows/lib/python3.7/site-packages/dataflows/helpers/rows_processor.py in process_resource(self, resource) 9 10 def process_resource(self, resource): ---> 11 yield from self.func(resource) ~/anaconda3/envs/dataflows/lib/python3.7/site-packages/dataflows/processors/printer.py in func(rows) 62 63 index = i + 1 ---> 64 prow = [index] + [truncate_cell(row[f], max_cell_size) for f in field_names] 65 yield row 66 ~/anaconda3/envs/dataflows/lib/python3.7/site-packages/dataflows/processors/printer.py in (.0) 62 63 index = i + 1 ---> 64 prow = [index] + [truncate_cell(row[f], max_cell_size) for f in field_names] 65 yield row 66 KeyError: 'שם הקדש'