datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
193 stars 39 forks source link

Bug with full-outer join, undesired fields, and dumping #163

Closed cschloer closed 2 years ago

cschloer commented 2 years ago

Hi,

I just found a bug with the join processor related to fields from the "source" being added into the target even if they aren't specified. The fields don't make it into the schema, so it can seem fine, but if you run any kind of dump step it errors (because there are unknown fields in the row).

I believe the issue is just that at line 261 of join, the "extra" dict should be set to the filtered dictionary, not updated by it.

Before the fix, the test throws an exception. Afterwards it runs as intended.

@akariv

akariv commented 2 years ago

fixed via https://github.com/datahq/dataflows/pull/165

akariv commented 2 years ago

Thanks a lot @cschloer, goo catch! Your fix broke a different test, but the above PR takes care of both cases (hopefully). Please take a look at the modified test and see if it makes sense to you.

cschloer commented 2 years ago

Sorry for not replying, but looks great!