NCATS-Tangerine / ncats-ingest

Management of ingestion of sources for NCATS-translator
2 stars 2 forks source link

Document ETL Strategy #9

Open jmcmurry opened 7 years ago

jmcmurry commented 7 years ago

ETL (Extraction/Transform/Load): We will devise and document our strategy for ingestion of knowledge sources, including ontologies, curated databases, text sources, unstructured data, etc. We will catalog data sources, and annotate each one with metadata such as datatype, format, method for ingestion, and which ontologies or ontology subsets or other metadata/data dependencies are required for ingestion. We will ensure that the strategy encompasses what may come from OHDSI, standardize value sets, LOINC, or other terminologies used in clinical data models. We will also include environmental data sources strategies in collaboration with Green team. We note that the inclusion of clinical data and environmental data - both of which are complex in their own rights, will change the focus of this milestone to be more unifying but perhaps less granular than originally proposed.