IDEMSInternational / parenttext-pipeline

Public repositry for all tools related to ParentText
GNU General Public License v3.0
2 stars 1 forks source link

Refactor config #118

Closed geoo89 closed 3 months ago

geoo89 commented 5 months ago

Refactor the config so that the sequence of pipeline steps can be customized.

Data sources factored out of pipeline steps.

In parallel, here's an example of a refactored config that this code can be tested on (if both config.json and config.py are present, the former is used): https://github.com/IDEMSInternational/parenttext-crisis/tree/pipeline-refactor

This PR uses setuptools_scm for the pipeline to determine its own version from git tags. The main tool checks whether the config version (reasonably) matches the pipeline version. The example config has version 1.0.0; in order to have the pipeline version match, after pulling this branch, open a terminal in the root folder and proceed as follows:

git tag -a 1.0.0 -m "test version"
python3 -m pip install -e .

You can then run the pipeline:

python3 -m parenttext_pipeline pull_data compile_flows
geoo89 commented 4 months ago

This has been tested by running the pipeline for parenttext-crisis using version 0.2.1 (HEAD with dependency bump: the pipeline would crash as the sheets were using features that were only added in later version of rpft) and using version 1.0.0 (using the same old config format). The outputs were identical, modulo UUIDs.

geoo89 commented 3 months ago

This PR should close automatically once #126 is merged.

Please make any further amendments to #126.