ACED-IDP / aced_etl_pod

etl worker pod
MIT License
1 stars 1 forks source link

Abstract out schema to not be hardcoded inside the ETL pod #16

Open matthewpeterkort opened 6 months ago

matthewpeterkort commented 6 months ago

Currently you have to make a new image every time the schema changes because it is hard coded to the etl_pod image. If this link was abstracted to a config file and fed to the etl pod some other way this would be much easier to iterate on, ex:

https://github.com/ACED-IDP/aced_etl_pod/blob/main/etl-job/fhir_import_export.py#L185

Also in the event that an unknown node type is present in the data, the whole ETL pod hangs indefinitely. Should hit some sort of hard error.

Current output behavior:


[2023-12-12 19:05:23,443][   INFO] File _aced-OMOPTEST-20231212-110245_meta.zip downloaded successfully
Archive:  /tmp/087cfb16-dc9f-57dd-bdcd-26a17b2d9be0/_aced-OMOPTEST-20231212-110245_meta.zip
 extracting: /root/studies/OMOPTEST/DocumentReference.ndjson  
 extracting: /root/studies/OMOPTEST/Observation.ndjson  
 extracting: /root/studies/OMOPTEST/MedicationRequest.ndjson  
 extracting: /root/studies/OMOPTEST/MedicationStatement.ndjson  
 extracting: /root/studies/OMOPTEST/Patient.ndjson  
studies/OMOPTEST/MedicationRequest.ndjson
INFO:iceberg_tools.util:studies/OMOPTEST/MedicationRequest.ndjson
medication_request.yaml not in schemas
INFO:iceberg_tools.data.simplifier:medication_request.yaml not in schemas
INFO:iceberg_tools.util:studies/OMOPTEST/DocumentReference.ndjson
studies/OMOPTEST/DocumentReference.ndjson
studies/OMOPTEST/Patient.ndjson
INFO:iceberg_tools.util:studies/OMOPTEST/Patient.ndjson
studies/OMOPTEST/Observation.ndjson
INFO:iceberg_tools.util:studies/OMOPTEST/Observation.ndjson
studies/OMOPTEST/MedicationStatement.ndjson
INFO:iceberg_tools.util:studies/OMOPTEST/MedicationStatement.ndjson
INFO:iceberg_tools.data.simplifier:medication_statement.yaml not in schemas
medication_statement.yaml not in schemas
bwalsh commented 6 months ago

@matthewpeterkort Was this resolved? If all OK, please close issue.

matthewpeterkort commented 6 months ago

@matthewpeterkort Was this resolved? If all OK, please close issue.

It was addressed here: https://github.com/ACED-IDP/aced_etl_pod/commit/13b732958d6e8f55dadf20684abb5a01a7b5ecd2 There is an image that has been built for it. This code is not part of main branch.

matthewpeterkort commented 4 months ago

Resolved in https://github.com/ACED-IDP/aced_etl_pod/pull/25 . Specifically: https://github.com/ACED-IDP/aced_etl_pod/pull/25/files#diff-da40c316a5c288300698362a3cb8f089c153caeb53d1313f5e61298eacf62a3aR453