frostyfan109 / tranql

A Translator Query Language
https://researchsoftwareinstitute.github.io/data-translator/apps/tranql
MIT License
0 stars 1 forks source link

Duplicate node ids (reason that the ICEES queries tend to have so many nodes with no links) #54

Closed frostyfan109 closed 5 years ago

frostyfan109 commented 5 years ago

For reference: image

This is caused by the fact that ICEES returns multiple nodes with the same ids. E.g. one node will be named "Fluticasone" and another may be named "Salmeterol." These appear as different nodes on the graph but both have the same id (rxcui:896190) and are therefore the same node. These should be merged together. Since links go off of id, only one of these two nodes will have links connected to it.

frostyfan109 commented 5 years ago

If we're lucky, all these duplicate nodes share common equivalent identifiers with each other given that they come from the same reasoner, so we should be able to fix this in the SelectStatement::merge_results method pretty easily.

frostyfan109 commented 5 years ago

We should be merging nodes ~and links~ with the same names as well.

Alternatively, a possibly better method would be building or adding to the equivalent_identifiers property by going through and appending the equivalent identifiers of every node/link that has the same name. They would then consequently get merged together due to sharing equivalent identifiers.

frostyfan109 commented 5 years ago

Nodes are now given the equivalent identifiers of all other nodes with the same name as them, thereby resulting in their eventual merging later in the answer merger.

What the query looks like now:

image