RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
39 stars 8 forks source link

Inaccurate same_as edges from DrugCentral in KG2.6.2? #2

Closed amykglen closed 3 years ago

amykglen commented 3 years ago

branching off a separate issue for this based on this comment of @edeutsch's in RTXteam/RTX#1426

indeed it seems maybe some DrugCentral edges could be a culprit in causing strange synonymization for acetaminophen in the KG2.6.x synonymizer (#1423)...

for instance, the synonymizer appropriately considers DrugCentral:52 (paracetamol) to be a synonym of acetaminophen, but if you look at the same_as edges for DrugCentral:52, some of them seem off:

match (n {id:'DrugCentral:52'})-[:`biolink:same_as`]-(m) return collect(distinct m.name)

returns this on kg2.6.2:

["dihydrocodeine and paracetamol", "tramadol and paracetamol", "oxycodone and paracetamol", "paracetamol", "paracetamol, combinations excl. psycholeptics", "codeine and paracetamol", "paracetamol, combinations with psycholeptics", "Acetaminophen", "ACETAMINOPHEN", "Acetaminophen-containing product"]

seems like some of these should maybe be part_of relationships, like @kvarforl did for the RXNORM edges in RTXteam/RTX#1423? would be awesome if the KG2 team could investigate when they have a chance.

amykglen commented 3 years ago

and to better illustrate how this issue could've contributed to making acetaminophen's list of equivalent curies in the 2-6-2 synonymizer so large - this query:

match (n {id:"CHEMBL.COMPOUND:CHEMBL112"})-[:`biolink:same_as` *1..4]-(m) return collect(distinct m.name)

returns this on kg2-6-2:

["PARA [cytosol]", "Acetaminophen (TN TYLENOL) [endoplasmic reticulum lumen]", "Acetaminophen", "ACETAMINOPHEN", "paracetamol", "dihydrocodeine and paracetamol", "tramadol and paracetamol", "oxycodone and paracetamol", "paracetamol, combinations excl. psycholeptics", "codeine and paracetamol", "paracetamol, combinations with psycholeptics", "Acetaminophen-containing product", "dihydrocodeine", "tramadol", "oxycodone", "codeine", "dihydrocodeine, combinations", "dihydrocodeine and other non-opioid analgesics", "dihydrocodeine and acetylsalicylic acid", "DIHYDROCODEINE BITARTRATE", "Dihydrocodeine", "DIHYDROCODEINE", "Dihydrocodeine-containing product", "tramadol and dexketoprofen", "tramadol and other non-opioid analgesics", "Tramadol", "TRAMADOL HYDROCHLORIDE", "TRAMADOL", "Tramadol-containing product", "oxycodone and acetylsalicylic acid", "oxycodone and ibuprofen", "oxycodone and naltrexone", "oxycodone and naloxone", "OXYCODONE TEREPHTHALATE", "Oxycodone", "oxycodone terephthalate", "OXYCODONE", "OXYCODONE HYDROCHLORIDE", "Oxycodone-containing product", "oxycodone hydrochloride", "codeine, combinations with psycholeptics", "codeine, combinations excl. psycholeptics", "codeine and ibuprofen", "codeine and other non-opioid analgesics", "codeine and acetylsalicylic acid", "CODEINE SULFATE", "CODEINE POLISTIREX", "CODEINE PHOSPHATE", "Codeine", "CODEINE", "Codeine-containing product", "CODEINE MONOHYDRATE"]

but this on kg2-5-2:

["Acetaminophen", "ACETAMINOPHEN", "paracetamol"]

edeutsch commented 3 years ago

super, thanks for figuring this out!

ecwood commented 3 years ago

I believe these lines (particularly 167): https://github.com/RTXteam/RTX/blob/2ac0de2d361990fccdeb500b299e13e2f1bb609f/code/kg2/drugcentral_json_to_kg_json.py#L154-L170 are the problem. I will work on a fix.

ecwood commented 3 years ago

It appears to be DrugCentral->biolink:same_as->ATC edges:

match (n {id:'DrugCentral:52'})-[:biolink:same_as]-(m) return distinct m.id, m.name

m.id | m.name -- | -- "ATC:N02AJ01" | "dihydrocodeine and paracetamol" "ATC:N02AJ13" | "tramadol and paracetamol" "ATC:N02AJ17" | "oxycodone and paracetamol" "ATC:N02BE01" | "paracetamol" "ATC:N02BE51" | "paracetamol, combinations excl. psycholeptics" "ATC:N02AJ06" | "codeine and paracetamol" "ATC:N02BE71" | "paracetamol, combinations with psycholeptics" "DRUGBANK:DB00316" | "Acetaminophen" "CHEMBL.COMPOUND:CHEMBL112" | "ACETAMINOPHEN" "VANDF:4017513" | "Acetaminophen" "SNOMED:387517004" | "Acetaminophen" "SNOMED:90332006" | "Acetaminophen-containing product" "UMLS:C0000970" | "Acetaminophen" "MESH:D000082" | "Acetaminophen"
ecwood commented 3 years ago

I'm wondering if we should use biolink:subclass_of as the predicate for these edges. On the webpage, unlike in the PostgreSQL database, the ATC codes for a particular drug are listed under "Pharmacologic Action" (example: https://drugcentral.org/drugcard/52). For the other "Pharmacologic Action" edges, we mapped it to biolink:subclass_of: https://github.com/RTXteam/RTX/blob/9986409a54f788855c335b4b337472afcefaf254/code/kg2/predicate-remap.yaml#L597-L600. Do you have any thoughts on this? (In the database itself, there isn't really a predicate given. The table is called struct2atc if that is helpful, though. Should we use DrugCentral:struct2atc as the relation?)

kvarforl commented 3 years ago

My thoughts are to keep it consistent and use biolink:subclass_of as the predicate, and then DrugCentral:pharmacologic_action as the relation with perhaps a comment and a link so that we can remember why it ended up that way.

that's just my two cents though! I'd appreciate it if @saramsey et al could weigh in :)

ecwood commented 3 years ago

Steve responded by email:

Based on what I am seeing, biolink:subclass_of seems reasonable. But I gathered from the discussion yesterday that these subclass_of relationships were causing a problem for someone, somewhere? If that is the case, we can just change them to biolink:related_to, and defer discussion of what more specific predicate to use, until a later date.

kvarforl commented 3 years ago

Ah excellent! perhaps @amykglen can add her thoughts then :)

ecwood commented 3 years ago

Steve wrote back:

OK, let's use "biolink:subclass_of".

edeutsch commented 3 years ago

sorry I'm late to the party here, meetings all morning, but didn't we solve this same problem with "part_of" for RXNORM? Shouldn't we be consistent? Which is more sensible?

"acetaminophen" is a subclass of "oxycodone and paracetamol" "acetaminophen" is a part of "oxycodone and paracetamol"

I would argue that our RXNORM decision makes more sense: this is a part of relationship not a subclass. I think using subclass_of here is dangerous because we intend to do subclass reasoning using subclass_of

I think subclass_of means that everything that is true for the parent is true for the descendant. Which would mean that everything that is true of "oxycodone and paracetamol" must also be true for "acetaminophen". And I don't think that's right?

Am I misunderstanding? or not thinking about this right?

ecwood commented 3 years ago

I think subclass_of means that everything that is true for the parent is true for the descendant. Which would mean that everything that is true of "oxycodone and paracetamol" must also be true for "acetaminophen". And I don't think that's right?

I think that, within DrugCentral, DrugCentral:52 is "paracetamol" rather than "acetaminophen" if that makes a difference.

match (n {id:'DrugCentral:52'})-[e:biolink:same_as]-(m {id: 'ATC:N02AJ01'}) return n.name, e.id, m.name

n.name e.id m.name
"paracetamol" "DrugCentral:52---biolink:same_as---ATC:N02AJ01---DrugCentral:" "dihydrocodeine and paracetamol"

Here are some more examples:

match (n)-[e:biolink:same_as]-(m) where split(n.id, ':')[0]='DrugCentral' and split(m.id, ':')[0]='ATC' return n.name, e.id, m.name limit 100

n.name e.id m.name
"mebendazole" "DrugCentral:1641---biolink:same_as---ATC:P02CA51---DrugCentral:" "mebendazole, combinations"
"ulipristal" "DrugCentral:4166---biolink:same_as---ATC:G03AD02---DrugCentral:" "ulipristal"
"levonorgestrel" "DrugCentral:1572---biolink:same_as---ATC:G03AD01---DrugCentral:" "levonorgestrel"
"cefcapene" "DrugCentral:4159---biolink:same_as---ATC:J01DD17---DrugCentral:" "cefcapene"
"cefditoren pivoxil" "DrugCentral:534---biolink:same_as---ATC:J01DD16---DrugCentral:" "cefditoren"
"cefteram" "DrugCentral:3076---biolink:same_as---ATC:J01DD18---DrugCentral:" "cefteram"
"ampicillin" "DrugCentral:198---biolink:same_as---ATC:J01CR01---DrugCentral:" "ampicillin and beta-lactamase inhibitor"
"ticarcillin" "DrugCentral:2656---biolink:same_as---ATC:J01CR03---DrugCentral:" "ticarcillin and beta-lactamase inhibitor"
"amoxicillin" "DrugCentral:192---biolink:same_as---ATC:J01CR02---DrugCentral:" "amoxicillin and beta-lactamase inhibitor"
"piperacillin" "DrugCentral:2187---biolink:same_as---ATC:J01CR05---DrugCentral:" "piperacillin and beta-lactamase inhibitor"
"sultamicillin" "DrugCentral:2539---biolink:same_as---ATC:J01CR04---DrugCentral:" "sultamicillin"
"cefpiramide" "DrugCentral:552---biolink:same_as---ATC:J01DD11---DrugCentral:" "cefpiramide"
"cefetamet" "DrugCentral:3074---biolink:same_as---ATC:J01DD10---DrugCentral:" "cefetamet"
"cefpodoxime proxetil" "DrugCentral:555---biolink:same_as---ATC:J01DD13---DrugCentral:" "cefpodoxime"
"cefoperazone" "DrugCentral:543---biolink:same_as---ATC:J01DD12---DrugCentral:" "cefoperazone"
"cefdinir" "DrugCentral:533---biolink:same_as---ATC:J01DD15---DrugCentral:" "cefdinir"
"ceftibuten" "DrugCentral:562---biolink:same_as---ATC:J01DD14---DrugCentral:" "ceftibuten"
"prasterone" "DrugCentral:795---biolink:same_as---ATC:G03XX01---DrugCentral:" "prasterone"
"nicotinyl methylamide" "DrugCentral:4862---biolink:same_as---ATC:A05AB01---DrugCentral:" "nicotinyl methylamide"
"pyrimethamine" "DrugCentral:2332---biolink:same_as---ATC:P01BD51---DrugCentral:" "pyrimethamine, combinations"
"creatinolfosfate" "DrugCentral:739---biolink:same_as---ATC:C01EB05---DrugCentral:" "creatinolfosfate"
"phosphocreatine" "DrugCentral:3464---biolink:same_as---ATC:C01EB06---DrugCentral:" "fosfocreatine"
"indomethacin" "DrugCentral:1440---biolink:same_as---ATC:C01EB03---DrugCentral:" "indometacin"
"camphor" "DrugCentral:470---biolink:same_as---ATC:C01EB02---DrugCentral:" "camphora"
"cromoglicic acid" "DrugCentral:741---biolink:same_as---ATC:A07EB01---DrugCentral:" "cromoglicic acid"
"ubidecarenone" "DrugCentral:4607---biolink:same_as---ATC:C01EB09---DrugCentral:" "ubidecarenone"
"ciclobendazole" "DrugCentral:3120---biolink:same_as---ATC:P02CA04---DrugCentral:" "ciclobendazole"
"flubendazole" "DrugCentral:1186---biolink:same_as---ATC:P02CA05---DrugCentral:" "flubendazole"
"fenbendazole" "DrugCentral:4536---biolink:same_as---ATC:P02CA06---DrugCentral:" "fenbendazole"
"mebendazole" "DrugCentral:1641---biolink:same_as---ATC:P02CA01---DrugCentral:" "mebendazole"
"thiabendazole" "DrugCentral:2621---biolink:same_as---ATC:P02CA02---DrugCentral:" "tiabendazole"
"albendazole" "DrugCentral:103---biolink:same_as---ATC:P02CA03---DrugCentral:" "albendazole"
"clioquinol" "DrugCentral:681---biolink:same_as---ATC:P01AA52---DrugCentral:" "clioquinol, combinations"
"latamoxef" "DrugCentral:1851---biolink:same_as---ATC:J01DD06---DrugCentral:" "latamoxef"
"ibuprofen" "DrugCentral:1407---biolink:same_as---ATC:C01EB16---DrugCentral:" "ibuprofen"
"cefmenoxime" "DrugCentral:538---biolink:same_as---ATC:J01DD05---DrugCentral:" "cefmenoxime"
"ivabradine" "DrugCentral:3312---biolink:same_as---ATC:C01EB17---DrugCentral:" "ivabradine"
"cefixime" "DrugCentral:537---biolink:same_as---ATC:J01DD08---DrugCentral:" "cefixime"
"ceftizoxime" "DrugCentral:563---biolink:same_as---ATC:J01DD07---DrugCentral:" "ceftizoxime"
"trimetazidine" "DrugCentral:2750---biolink:same_as---ATC:C01EB15---DrugCentral:" "trimetazidine"
"cefodizime" "DrugCentral:541---biolink:same_as---ATC:J01DD09---DrugCentral:" "cefodizime"
"acadesine" "DrugCentral:37---biolink:same_as---ATC:C01EB13---DrugCentral:" "acadesine"
"adenosine" "DrugCentral:90---biolink:same_as---ATC:C01EB10---DrugCentral:" "adenosine"
"dipyrocetyl" "DrugCentral:3160---biolink:same_as---ATC:N02BA79---DrugCentral:" "dipyrocetyl, combinations with psycholeptics"
"tiracizine" "DrugCentral:2679---biolink:same_as---ATC:C01EB11---DrugCentral:" "tiracizine"
"ranolazine" "DrugCentral:2359---biolink:same_as---ATC:C01EB18---DrugCentral:" "ranolazine"
"prednisolone" "DrugCentral:2245---biolink:same_as---ATC:D07AA03---DrugCentral:" "prednisolone"
"hydrocortisone" "DrugCentral:1388---biolink:same_as---ATC:D07AA02---DrugCentral:" "hydrocortisone"
"methylprednisolone" "DrugCentral:1768---biolink:same_as---ATC:D07AA01---DrugCentral:" "methylprednisolone"
"glycobiarsol" "DrugCentral:4849---biolink:same_as---ATC:P01AR53---DrugCentral:" "glycobiarsol, combinations"
"ethenzamide" "DrugCentral:1080---biolink:same_as---ATC:N02BA77---DrugCentral:" "ethenzamide, combinations with psycholeptics"
"salicylamide" "DrugCentral:2415---biolink:same_as---ATC:N02BA75---DrugCentral:" "salicylamide, combinations with psycholeptics"
"ceftazidime" "DrugCentral:559---biolink:same_as---ATC:J01DD02---DrugCentral:" "ceftazidime"
"cefotaxime" "DrugCentral:546---biolink:same_as---ATC:J01DD01---DrugCentral:" "cefotaxime"
"ceftriaxone" "DrugCentral:564---biolink:same_as---ATC:J01DD04---DrugCentral:" "ceftriaxone"
"acetylsalicylic acid" "DrugCentral:74---biolink:same_as---ATC:N02BA71---DrugCentral:" "acetylsalicylic acid, combinations with psycholeptics"
"cefsulodin" "DrugCentral:558---biolink:same_as---ATC:J01DD03---DrugCentral:" "cefsulodin"
"acetoxolone" "DrugCentral:62---biolink:same_as---ATC:A02BX09---DrugCentral:" "acetoxolone"
"gefarnate" "DrugCentral:1281---biolink:same_as---ATC:A02BX07---DrugCentral:" "gefarnate"
"regadenoson" "DrugCentral:2362---biolink:same_as---ATC:C01EB21---DrugCentral:" "regadenoson"
"sulglicotide" "DrugCentral:4725---biolink:same_as---ATC:A02BX08---DrugCentral:" "sulglicotide"
"meldonium" "DrugCentral:3995---biolink:same_as---ATC:C01EB22---DrugCentral:" "meldonium"
"glycobiarsol" "DrugCentral:4849---biolink:same_as---ATC:P01AR03---DrugCentral:" "glycobiarsol"
"difetarsone" "DrugCentral:4419---biolink:same_as---ATC:P01AR02---DrugCentral:" "difetarsone"
"neomycin" "DrugCentral:4247---biolink:same_as---ATC:S03AA01---DrugCentral:" "neomycin"
"arsthinol" "DrugCentral:3007---biolink:same_as---ATC:P01AR01---DrugCentral:" "arsthinol"
"cefotaxime" "DrugCentral:546---biolink:same_as---ATC:J01DD51---DrugCentral:" "cefotaxime and beta-lactamase inhibitor"
"gentamicin" "DrugCentral:4265---biolink:same_as---ATC:S03AA06---DrugCentral:" "gentamicin"
"ceftazidime" "DrugCentral:559---biolink:same_as---ATC:J01DD52---DrugCentral:" "ceftazidime and beta-lactamase inhibitor"
"ciprofloxacin" "DrugCentral:659---biolink:same_as---ATC:S03AA07---DrugCentral:" "ciprofloxacin"
"chloramphenicol" "DrugCentral:589---biolink:same_as---ATC:S03AA08---DrugCentral:" "chloramphenicol"
"rebamipide" "DrugCentral:2360---biolink:same_as---ATC:A02BX14---DrugCentral:" "rebamipide"
"ceftriaxone" "DrugCentral:564---biolink:same_as---ATC:J01DD54---DrugCentral:" "ceftriaxone, combinations"
"bismuth subnitrate" "DrugCentral:4841---biolink:same_as---ATC:A02BX12---DrugCentral:" "bismuth subnitrate"
"tetracycline" "DrugCentral:2611---biolink:same_as---ATC:S03AA02---DrugCentral:" "tetracycline"
"alginic acid" "DrugCentral:4321---biolink:same_as---ATC:A02BX13---DrugCentral:" "alginic acid"
"Polymyxin B" "DrugCentral:4246---biolink:same_as---ATC:S03AA03---DrugCentral:" "polymyxin B"
"zolimidine" "DrugCentral:3661---biolink:same_as---ATC:A02BX10---DrugCentral:" "zolimidine"
"chlorhexidine" "DrugCentral:597---biolink:same_as---ATC:S03AA04---DrugCentral:" "chlorhexidine"
"hexamidine" "DrugCentral:3275---biolink:same_as---ATC:S03AA05---DrugCentral:" "hexamidine"
"troxipide" "DrugCentral:2779---biolink:same_as---ATC:A02BX11---DrugCentral:" "troxipide"
"dihydroergocristine" "DrugCentral:887---biolink:same_as---ATC:C04AE04---DrugCentral:" "dihydroergocristine"
"ergoloid mesylates" "DrugCentral:5035---biolink:same_as---ATC:C04AE01---DrugCentral:" "ergoloid mesylates"
"nicergoline" "DrugCentral:1910---biolink:same_as---ATC:C04AE02---DrugCentral:" "nicergoline"
"dipyrocetyl" "DrugCentral:3160---biolink:same_as---ATC:N02BA59---DrugCentral:" "dipyrocetyl, combinations excl. psycholeptics"
"ethenzamide" "DrugCentral:1080---biolink:same_as---ATC:N02BA57---DrugCentral:" "ethenzamide, combinations excl. psycholeptics"
"cefoperazone" "DrugCentral:543---biolink:same_as---ATC:J01DD62---DrugCentral:" "cefoperazone and beta-lactamase inhibitor"
"cefpodoxime proxetil" "DrugCentral:555---biolink:same_as---ATC:J01DD64---DrugCentral:" "cefpodoxime and beta-lactamase inhibitor"
"salicylamide" "DrugCentral:2415---biolink:same_as---ATC:N02BA55---DrugCentral:" "salicylamide, combinations excl. psycholeptics"
"ceftriaxone" "DrugCentral:564---biolink:same_as---ATC:J01DD63---DrugCentral:" "ceftriaxone and beta-lactamase inhibitor"
"acetylsalicylic acid" "DrugCentral:74---biolink:same_as---ATC:N02BA51---DrugCentral:" "acetylsalicylic acid, combinations excl. psycholeptics"
"epicillin" "DrugCentral:1025---biolink:same_as---ATC:J01CA07---DrugCentral:" "epicillin"
"pivmecillinam" "DrugCentral:2219---biolink:same_as---ATC:J01CA08---DrugCentral:" "pivmecillinam"
"azlocillin" "DrugCentral:277---biolink:same_as---ATC:J01CA09---DrugCentral:" "azlocillin"
"anethole trithione" "DrugCentral:218---biolink:same_as---ATC:A16AX02---DrugCentral:" "anethole trithione"
"thioctic acid" "DrugCentral:4732---biolink:same_as---ATC:A16AX01---DrugCentral:" "thioctic acid"
"ampicillin" "DrugCentral:198---biolink:same_as---ATC:J01CA01---DrugCentral:" "ampicillin"
"pivampicillin" "DrugCentral:2218---biolink:same_as---ATC:J01CA02---DrugCentral:" "pivampicillin"
"carbenicillin" "DrugCentral:492---biolink:same_as---ATC:J01CA03---DrugCentral:" "carbenicillin"
"amoxicillin" "DrugCentral:192---biolink:same_as---ATC:J01CA04---DrugCentral:" "amoxicillin"
ecwood commented 3 years ago

Should we do a general biolink:related_to for the purposes of the rebuild?

kvarforl commented 3 years ago

Here is the biolink entry for subclass of: https://github.com/biolink/biolink-model/blob/04a4fbb9d1d64784c29df4fc2a951df1a30bfe2e/biolink-model.yaml#L1193-L1198

image

Looking at the example mappings in the rest of the biolink model entries, Eric's interpretation might be correct (suggesting that part of could indeed be a better fit)

Here is the biolink entry for part of, which I think even further reinforces that it would be a good fit: https://github.com/biolink/biolink-model/blob/04a4fbb9d1d64784c29df4fc2a951df1a30bfe2e/biolink-model.yaml#L3656-L3662

image
kvarforl commented 3 years ago

Should we do a general biolink:related_to for the purposes of the rebuild?

This is definitely the safest option! I think part of makes sense, but as demonstrated by my willy nilly comments above, my thoughts on correct predicates aren't always the most accurate :)

kvarforl commented 3 years ago

I'm wondering if we should use biolink:subclass_of as the predicate for these edges. On the webpage, unlike in the PostgreSQL database, the ATC codes for a particular drug are listed under "Pharmacologic Action" (example: https://drugcentral.org/drugcard/52). For the other "Pharmacologic Action" edges, we mapped it to biolink:subclass_of:

https://github.com/RTXteam/RTX/blob/9986409a54f788855c335b4b337472afcefaf254/code/kg2/predicate-remap.yaml#L597-L600

Perhaps we should confirm that subclass of makes the most sense here too?

ecwood commented 3 years ago

For reference, later on, in 3443 of the 4817 total DrugCentral->ATC edges the DrugCentral name is the same as the ATC name.

edeutsch commented 3 years ago

In the list above from @ericawood many indeed are same_as when they have the same name. And acetaminophen is the same as paracetamol BUT, many in the list above are not the same as. "ceftazidime" is not the same as "ceftazidime and beta-lactamase inhibitor" One possible rule is to look for "and" or "combinations" in the name. For any of those, I think the part_of relationship is appropriate.

ecwood commented 3 years ago

Perhaps we should confirm that subclass of makes the most sense here too?

You might run: match (n)-[e {relation: 'DrugCentral:PA'}]-(m) return n.name, e.id, m.name

ecwood commented 3 years ago

One possible rule is to look for "and" or "combinations" in the name. For any of those, I think the part_of relationship is appropriate.

So maybe:

if subject_name == object_name:
    predicate = "biolink:same_as"
else:
    predicate = "biolink:part_of"

?

edeutsch commented 3 years ago

That is certainly safer/safest. Maybe could do case-insensitive match.

saramsey commented 3 years ago

One possible rule is to look for "and" or "combinations" in the name. For any of those, I think the part_of relationship is appropriate.

So maybe:

if subject_name == object_name:
    predicate = "biolink:same_as"
else:
    predicate = "biolink:part_of"

?

Seems reasonable

saramsey commented 3 years ago

sorry I'm late to the party here, meetings all morning, but didn't we solve this same problem with "part_of" for RXNORM? Shouldn't we be consistent? Which is more sensible?

"acetaminophen" is a subclass of "oxycodone and paracetamol" "acetaminophen" is a part of "oxycodone and paracetamol"

I would argue that our RXNORM decision makes more sense: this is a part of relationship not a subclass. I think using subclass_of here is dangerous because we intend to do subclass reasoning using subclass_of

I think subclass_of means that everything that is true for the parent is true for the descendant. Which would mean that everything that is true of "oxycodone and paracetamol" must also be true for "acetaminophen". And I don't think that's right?

Am I misunderstanding? or not thinking about this right?

Hmm, I may wish to revise my recommendation based on Eric's point. I'm going to discuss with @ericawood

ecwood commented 3 years ago

Steve and I met on Zoom to discuss options. Here are some notes:

I'm going to start the "rebuild" (the TSV file fixer) unless anyone has any pressing concerns.

edeutsch commented 3 years ago

not ideal, but given that the process does have access to ATC names, this seems like the most sensible reasonably easy fix. thanks.

ecwood commented 3 years ago

This appears patched in KG2.6.3:

match (n)-[e]-(m) where split(n.id, ':')[0]='DrugCentral' and split(m.id, ':')[0]='ATC' return distinct e.predicate, e.relation, count(e):

e.predicate e.relation count(e)
"biolink:close_match" "DrugCentral:struct2atc" 4817

I'm not going to close this out yet, though, until I make the change in the code as well.