callahantiff / PheKnowLator

PheKnowLator: Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models
https://github.com/callahantiff/PheKnowLator/wiki
Apache License 2.0
157 stars 29 forks source link

Simplify input files -- input yaml #127

Open callahantiff opened 2 years ago

callahantiff commented 2 years ago

Use a single yaml to organize input ontology and edge sources as well as to specify the information that is needed to parse them. This would enable the replacement of the following files: ontology_source_list.txt, edge_source_list.txt, and resource_info.txt. It would also completely remove the data ingest class and remove the need for the user input automation script.

Consider a similar approach and/or (even better) including additional arguments for instructions on metadata processing.

callahantiff commented 2 years ago

Also add a header key that provides definitions for all required input parameters.

callahantiff commented 1 year ago

Similarly related -- use the info here, like the prefix used for nodes to update the edge_source_metadata.txt files. As well as creating named graphs or incorporating relevant information into the edgelist metadata

callahantiff commented 1 year ago

Make sure that we are getting the most up-to-date data from each source, including our queries

callahantiff commented 1 year ago

Zip all data files to reduce storage space

callahantiff commented 1 year ago

Verify the gene and disease identifier mappings for genes, disease, and chemicals. Some weirdness is happening.