Annex A: Pipeline development

MichaelHanksSF commented 2 weeks ago

This is the issue to create the generic Annex A pipeline, using the schema built in #61 The pipeline will need to:

[ ] validate an xlsx file with a tab per Annex A list
[ ] incorporate the new "year_month" element from the filename in exactly the same way "year" is currently handled; for the purpose of differentiating between different returns e.g. Annex A will allow 2024_Jan, 2024_Feb.... where 903 only has 2024; adding the column "year_month" instead of "year" to all files
[ ] produce a cleanfile output of 1 csv per list (so each input file will produce n csvs where n = number of lists in Annex A)
[ ] concatenate cleanfiles together at the la level
[ ] create a reports output for each list for the region
[ ] make the usual logs and outputs (clean, concat, reports) available in the standard places in the infrastructure

MichaelHanksSF commented 2 weeks ago

See Patrick's existing list:

[ ] create annex_a assets which load configuration files and point to input/output locations
[ ] create annex_a ops, functions will need:
- [ ] create session folder
- [ ] create archive
- [ ] process_file (formerly clean_file)
- [ ] create_current_view (la_agg)
- [ ] create_reports (pan_agg)
[ ] create annex_a job running the ops in order
[ ] add the annex_job to the repository.py file

MichaelHanksSF commented 1 week ago

3 days

SocialFinanceDigitalLabs / liia-tools-pipeline