Closed gaurav closed 3 months ago
Ah, wait, I was looking at an older conflation file -- the latest conflation file does implement CURIE suffix sorting, so that PUBCHEM.COMPOUND:962 should be the correct preferred ID for water:
["PUBCHEM.COMPOUND:962", "PUBCHEM.COMPOUND:105142", "PUBCHEM.COMPOUND:10129877", "RXCUI:150985", "RXCUI:204918", "RXCUI:340584", "RXCUI:379002", "RXCUI:1043588", "RXCUI:1045437", "RXCUI:1045439", "RXCUI:1053147", "RXCUI:1053148", "RXCUI:1053172", "RXCUI:1053173", "RXCUI:1053428", "RXCUI:1053429", "RXCUI:1053489", "RXCUI:1053490", "RXCUI:1151100", "RXCUI:1151101", "RXCUI:1161792", "RXCUI:1161794", "RXCUI:1161795", "RXCUI:1180556", "RXCUI:1235498", "RXCUI:1235499", "RXCUI:1235500", "RXCUI:1235501", "RXCUI:1235502", "RXCUI:1235503", "RXCUI:1235504", "RXCUI:1310241", "RXCUI:1314884", "RXCUI:1423320", "RXCUI:1423321", "RXCUI:1424601", "RXCUI:1424602", "RXCUI:1424603", "RXCUI:1424604", "RXCUI:1424605", "RXCUI:1425974", "RXCUI:1425975", "RXCUI:1425976", "RXCUI:1425977", "RXCUI:1425978", "RXCUI:1489375", "RXCUI:1489376", "RXCUI:1489377", "RXCUI:1489378", "RXCUI:1539535", "RXCUI:1549855", "RXCUI:2108561", "RXCUI:2360606", "RXCUI:2360607", "RXCUI:2360608", "RXCUI:2360609", "RXCUI:2360610", "RXCUI:2601721", "RXCUI:2601722", "UMLS:C0359299", "UMLS:C1883551", "UMLS:C3857954"]
I still don't know why it's returning RXCUI:1161795 as the preferred ID, though.
I could fix this by reloading the database, so yes, it appears to be the copying process that is at fault. Chris tells me that the conflation code uses the order of the results in Redis, so it may be that the copying code in https://github.com/helxplatform/translator-devops/pull/768 isn't preserving that order for some reason, maybe because we're getting back Redis protocol commands in an unusual order (see https://github.com/sripathikrishnan/redis-rdb-tools#emitting-redis-protocol for more information).
Question to myself: is it true that GeneProtein (which is structured identically to ChemicalDrug) was restored without any problem, or is the order also broken for that file?
After several restores, we haven't seen this problem return, so it does appear to have been caused by using an incorrect input file. I'm going to go ahead and close this, but will reopen it if the problem recurs.
Compare https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=MESH%3AD014867&conflate=true&drug_chemical_conflate=true&description=false with https://nodenormalization-dev.apps.renci.org/1.4/get_normalized_nodes?curie=MESH%3AD014867&conflate=true&drug_chemical_conflate=true&description=false -- to further confuse matters, the actual line from the conflation file is:
So really PUBCHEM.COMPOUND:10129877 "Water-O-15" should be preferred ID!
This might be because: