ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

SDMX Transform - Refactor Supply & Use Tables Proto for Production (Performance / Design Patterns etc.) DIS-472 #77

Closed osamede20 closed 4 months ago

osamede20 commented 5 months ago

What

As part of the task, reviewed the SDMX transform code for performance and scalability. Included a line on the xmlToCsvSDMX2_0() to return the tidy dataframe in order to have something to check the output tidy.csv against in the unit test.

Wrote a unit test to check the output tidy csv against the columns, headers and some other details in the input xml file.

How to review

For the transform function xmlToCsvSDMX2_0(), there is a required performance improvement which will ensure that the for-loop which goes through the series block should only have to check the headers once per series block rather than looping past it every time. This is robustly documented on the in-line transform code.

For the unit tests, a review to check performance, design, logic, etc is required. Describe the steps required to test the changes.

Who can review

For the first part, transform script, anyone can review. Fo the test, anyone more experienced in unit testing would be great