Add a new parameter to duplicate to aid join

datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.

https://dataflows.org

MIT License

193 stars 39 forks source link

Add a new parameter to duplicate to aid join #156

Closed cschloer closed 3 years ago

cschloer commented 3 years ago

Hey, currently duplicate automatically adds the new resource directly after the source resource. It would be very useful to have the ability to "duplicate to end", so that we can use the duplicate processor to facilitate joining "out of order" resources.

long story short: We have a usecase where we want to join resource A with resource B, but resource B came out of resource A and so it is ALWAYS located after resource A. We could load in resource A again, but there are quite a few processing steps that would be duplicated in that case. Using duplicate on resource A would work if the duplicated resource ended up after resource B, which this PR would facilitate.

I can make a PR in datapackage_pipelines after this has been accepted in order to make this parameter work properly there as well.

@akariv @roll

cschloer commented 3 years ago

Also FYI, it seems as if in the last commit you imported ExcelXMLParser but didn't actually create the file, so none of the checks will pass (though after removing the imports the tests run properly).

akariv commented 3 years ago

Thanks @cschloer !

This looks good - only thing missing is adding documentation for this new parameter in PROCESSORS.md.

I've also fixed the tests on master (thanks again) so after rebase the PR should also pass.

cschloer commented 3 years ago

OK, updated the documentation and merged in the updates from upstream.

akariv commented 3 years ago

Hey @cschloer - thanks! There are still a few lint errors in the code (probably related to a new rule I've added recently to enforce quote style throughout the code). Either way it should take 2 minutes to fix (see here: https://travis-ci.org/github/datahq/dataflows/jobs/766186356)

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 531

13 of 13 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.1%) to 85.678%

Totals
Change from base Build 529:	0.1%
Covered Lines:	1998
Relevant Lines:	2332

💛 - Coveralls

cschloer commented 3 years ago

OK fixed that! @akariv

roll commented 3 years ago

Thanks @akariv @cschloer!

akariv commented 3 years ago

https://pypi.org/project/dataflows/0.2.11/