ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

Optimize sdmx transform script #98

Closed osamede20 closed 5 months ago

osamede20 commented 5 months ago

What

  1. The codes were refactored. The process of getting the Header dictionary remains the same. The Python ElementTree package could not be used for the header dictionary as most of the header elements do not have attributes.
  2. Two new functions were created - flatten_dict and convert. The flatten_dict flattens the nested Header dictionary and the convert converts the tuple dictionary to a flat 'concatenated' dictionary. The two functions are included in the utils.py
  3. The Python ElementTree package was used to retrieve the observation block dictionaries and the parent series block dictionaries with which each observation block shares common headers.

How to review

The codes can be checked for performance optimisation to see if the processing time could be further reduced from about 10s. They can also be checked for scalability.

Who can review

Any DE can review.

https://jira.ons.gov.uk/browse/DIS-472