callahantiff / PheKnowLator

PheKnowLator: Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models
https://github.com/callahantiff/PheKnowLator/wiki
Apache License 2.0
159 stars 29 forks source link

Add integer and identifiers to node metadata #58

Closed callahantiff closed 3 years ago

callahantiff commented 3 years ago

Problem: Right now the node metadata that is output is keyed by an identifier, which means if you use the integer edge lists, but want node labels you have to use the provided dictionary that maps node integers to identifiers first.

Solution: In the next iteration, I will add a new column that includes the identifier and the integer. Examples of each output are shown below.

An example of the current output: node_id label description/definition synonym
388324 INCA1 INCA1 has locus group 'protein-coding' and is located on chromosome 17 (map_location: 17p13.2). HSD45protein INCA1
92106 OXNAD1 OXNAD1 has locus group 'protein-coding' and is located on chromosome 3 (map_location: 3p25.1-p24.3). oxidoreductase NAD-binding domain-containing protein 1
56140 PCDHA8 PCDHA8 has locus group 'protein-coding' and is located on chromosome 5 (map_location: 5q31.3). PCDH-ALPHA8protocadherin alpha-8 KIAA0345-like 6 PCDH-alpha-8


An example of the improved output: node_integer node_id label description/definition synonym
0 388324 INCA1 INCA1 has locus group 'protein-coding' and is located on chromosome 17 (map_location: 17p13.2). HSD45protein INCA1
1 92106 OXNAD1 OXNAD1 has locus group 'protein-coding' and is located on chromosome 3 (map_location: 3p25.1-p24.3). oxidoreductase NAD-binding domain-containing protein 1
2 56140 PCDHA8 PCDHA8 has locus group 'protein-coding' and is located on chromosome 5 (map_location: 5q31.3). PCDH-ALPHA8protocadherin alpha-8 KIAA0345-like 6 PCDH-alpha-8
callahantiff commented 3 years ago
callahantiff commented 3 years ago

Verify node identifiers in output metadata, formatting for ensemble transcripts looks a bit off

callahantiff commented 3 years ago
callahantiff commented 3 years ago

A brief example of what this would like for node identifier rs201492213 is shown below:

node_type = 'variant' edge_type = ['variant-phenotype', 'variant-gene']

callahantiff commented 3 years ago

TASK

Task Type: CODEBASE

Improve the output metadata for nodes and edges in the knowledge graph

TODO

The following items have been condensed from the issues above.

callahantiff commented 3 years ago

Done as part of #84