cladteam / CCDA_OMOP_by_Python

2 stars 3 forks source link

build the pipeline between vocab steps and structural mapping then DQ #119

Open chrisroederucdenver opened 3 weeks ago

chrisroederucdenver commented 3 weeks ago

I have it started here https://foundry.cladplatform.org/workspace/data-integration/monocle/graph/ri.monocle.main.graph.834c55b5-df54-4dd0-bbe3-bff5f8362da7?coloring=monocle.color.resource-type

it's in the top-level CCDA folder called HIE-HIN CCDA Pipeline. At the moment, it lacks a link between the concept_xwalk and the domain table creation. When I try to build in the lineage app, it doesn't know how to build things that come out of a workspace like the vocab_discovered_codes or the domain tables (visit, observation, measurement and person_second_try (yeah, need to rename/fix that one))

FYI vocab_discovered_codes is built by the workspace called CCDA-tools Workspace, and the domain tables are created by the workspace called CCDA_OMOP_by_Python.

FYI @tannerzhang @stephanieshong @AdamLeeIT

chrisroederucdenver commented 3 weeks ago

I just spoke with Anne Bailey about completing the link in the pipeline to activate the workspaces. Basically, it has to be a notebook, an ipynb, not raw python for the execution to link down the pipeline. We also have the option of putting the python xml parsing code into a code repo in the Spark world where the execution links in a more familiar way. This would involve some wrapper code that uses the foundry transform class. Quite feasible.

However, we should keep the scope and development in mind and balance the ease of dev in workspaces vs the work of running the pipeline manually. Humans will be in the process to evaluate the run etc. so not having automation shouldn't impact a lot.

And then, considering the simplicity of the snooper that inputs to the crosswalk file creating, we should be able to put that into a code repo and run it as part of the early vocab pipeline.

chrisroederucdenver commented 1 day ago

tagging @AdamLeeIT here because he has an ipynb in the DQ part of the project.