BritishGeologicalSurvey / etlhelper

ETL Helper is a Python ETL library to simplify data transfer into and out of databases.
https://britishgeologicalsurvey.github.io/etlhelper/
GNU Lesser General Public License v3.0
100 stars 25 forks source link

139 add transform option to load and executemany #166

Closed leorudczenko closed 1 year ago

leorudczenko commented 1 year ago

Summary

A new argument has been added to both the load and executemany functions: transform.

Description

This new argument has been implemented in accordance with the requirements in issue #139.

There has been a minor change which goes against the acceptance criteria. The new tests for the transform argument in the load function have been created in the test/integration/etl/test_etl_transform.py file, rather than the test/integration/etl/test_etl_abort.py file. This has been done because the new tests are for the transform functionality and have no interaction with the abort process.

Closes #139

volcan01010 commented 1 year ago

I've made some changes here. In general, I wanted to emphasise the ability to use yield in a transform function. This allows the transform to be used in efficient generator pipelines and, if you aren't scared of using yield, it gives simpler transform functions.

When I did this, I noticed that load couldn't handle transform functions that returned a generator instead of a list. I have made changes to allow this. I also removed transform functions that rely on in-place updating of items within a chunk list as this can also cause problems with generator-based workflows. The README now defines the two recommended ways to write a transform function.

Have a look and check that it all makes sense. If you are happy, the please merge. Or we can discuss first.

volcan01010 commented 1 year ago

Reviewed together with @leorudczenko.