FOLIO-FSE / folio_migration_tools

A Python module and CLI tool that transforms legacy ILS data into the native FOLIO formats and loads it into FOLIO
MIT License
11 stars 8 forks source link

Add parameter to include task name in all output files (including maps and extradata) #774

Open banerjek opened 2 months ago

banerjek commented 2 months ago

Parallel processing files is currently awkward -- it must either be done by creating map and extradata files from object files or creating parallel iterations.

Suggest true/falsetaskNameBasedFiles parameter, default to current behavior. Setting it to true would include task name in maps and extradata files as is done with object files. Expectation would be that IC would manage these files manually/separately. This most useful for bibs and holdings (MARC and CSV based) tasks.

Aside from simplifying parallel processes, the feature would simplify combining task output from different sources requiring different maps.

bltravis commented 2 months ago

@banerjek Could you describe the workflow you envision being supported by this enhancement a bit more?

banerjek commented 1 month ago

Goal is to simplify parallel processing.

Currently, object and extradata files (I mistitled this) all contain the task name, but maps do not -- this is based on migrationTaskType.

This makes it awkward to run overlapping processes for the same task type (e.g. different source systems, vertically sharding out processes for Instance/Holdings/Items with especially large systems) -- you basically have to run parallel iterationIdentifiers or directories which is doable, but clunky.