ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

JJ/DIS-566 #69

Closed RedWalters closed 4 months ago

RedWalters commented 5 months ago

What

Includes most of the changes requested issue-15 related to the main sdmx transform. But the main thing to be reviewed here is the metadata generation function

How to review

There's gonna be 2 steps to this one, the first is gonna be a review of the code itself; its mostly just pulling in and manipulating a dictionary with as much information as possible but there may be better ways to do what needs to be done. The other is a review of the actual output itself; I'm not 100% on the structure/contents so both of these things will need reviewing and possibly correcting/informing.

Who can review

For the first part anyone really can review this, though people with experience messing with json files would be best. the latter will need someone familiar with the application profile so they can have some back and forth on the structure/contents of the output metadata

RedWalters commented 5 months ago

so to link this into the pipeline it'll need a function with a signature of:

def jjs_code(path_to_input_1: Path) -> Tuple[Path, Path]:
      # all of the stuff happens here

      return path_to_csv_file, path_to_metadata_json_file

There's a simple hard coded one here that Jim did a while ago to help is hook things together on the SE side:

https://github.com/ONSdigital/dp-data-pipelines/blob/3c3653ae96794616fae5686c5ab74bb3ca1517c9/dpypelines/pipeline/shared/transforms/sdmx/v1.py#L40

Not gonna get involved in transform code, but we need that pattern (or to wrap you code un a function with that pattern) to make it all work together.

So my plan for this was to have the functions for the transforms kept in a separate file (so this one for instance) and the have the pipeline reference these function from elsewhere which has both the correct signatures, switches for various smdx versioning and validation and what-have-you without having all the code in one place.

So we'll have a file that looks very similar to this with all the lengthy transform code held somewhere else that can be edited/added to as needed without risking messing with the pipeline stuff itself.

That was my idea anyway

RedWalters commented 4 months ago

This issue is gonna be covered by issue-87