ACED-IDP / aced_etl_pod

etl worker pod
MIT License
1 stars 1 forks source link

Adjust server job to handle .git upload and Bundle.ndjson #29

Open bwalsh opened 1 month ago

bwalsh commented 1 month ago

The new g3t client will upload a *.git.zip file.

Should/how the server job:

matthewpeterkort commented 1 month ago

deprecate reading .meta.zip and producing SNAPSHOT.zip in lieu of reading from *.git.zip

Parts of this branch should get folded into this. Specifically separating download part of _dowload_and_unzip into its own function - https://github.com/ACED-IDP/aced_etl_pod/blob/feature/reset-commit/etl-job/fhir_import_export.py#L173C5-L193

process Bundle.ndjson to process deletes of files meta that no longer exist

So given a bundle like:

{ "resourceType": "Bundle", "id": "a3bd9781-444b-51e3-8c39-dd075207b758", "identifier": { "use": "official", "system": "https://aced-idp.org/cbds-5e3cfd2db836439db952804ac7b44194", "value": "2024-05-16T20:40:08.109104Z" }, "type": "transaction", "timestamp": "2024-05-16T20:40:08.109104+00:00", "entry": [ { "request": { "method": "DELETE", "url": "DocumentReference/ca5a5fb3-1c5d-5e05-b026-2ed62720f625" } } ], "issues": { "resourceType": "OperationOutcome", "issue": [ { "severity": "warning", "code": "processing", "diagnostics": "Meta data items no longer in study." } ] } }

unzip, pass to an aced submission function that iterates through every dict in "entry" and creates a list of the method:url where url is just the uuid. Then do a bulk delete type action for each "method": "DELETE", in this case DELETE.

elastic: https://github.com/ACED-IDP/submission/blob/cf04d2cabab37d743a48d6b7711ec6f0ace2bee1/aced_submission/meta_flat_load.py#L726-L748

and in peregrine something similar to: https://github.com/ACED-IDP/submission/blob/cf04d2cabab37d743a48d6b7711ec6f0ace2bee1/aced_submission/meta_graph_load.py#L390-L395

produce a better [out] json file

Yeah, especially ensuring that logs in the submission library are preserved and returned to the user in the etl_pod library, and not just when exceptions occur.