RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
38 stars 8 forks source link

Consider Restructuring UMLS/Ontology Build Process #316

Open ecwood opened 1 year ago

ecwood commented 1 year ago

Of the issues that pop up with KG2, a significant percentage of them seem to be multi_ont_to_json_kg.py related. While it it great that we can perform these imports with relatively minimal source-specific tailoring, this is often the root of these issues. These issues include #303, #277, #283, #129, #212, #57 and many closed issues. I think that we should consider, for the ontologies, ETLing them individually. There have also been many hacky work arounds, like making umls-mth last in ont-load-inventory.yaml for #19 and the entirety of the ontobio-507 workaround. For UMLS, we could likely largely keep the same system OR create a new ETL that queries MySQL directly. It is too hard to maintain multi_ont_to_json_kg.py to the caliber we require while keeping it generalized (i.e. without specific exceptions and rules).

This would be a lofty task but would be a considerable investment for future maintainers.

I would love to get @saramsey and @acevedol's opinions on this.

saramsey commented 1 year ago

Good summary of the issue. I agree that module's complexity has made maintenance more time-consuming and difficult than it otherwise would be, if the module had a simpler design. We could start by refactoring some of the huge functions into groups of smaller functions, adding some unit tests and comments, etc.

ecwood commented 1 year ago

Some of the UMLS nodes do not have English names. I ran a query on those:

["UMLS:C0005475", "UMLS:C0005521", "UMLS:C0012245", "UMLS:C0013320", "UMLS:C0013458", "UMLS:C0023099", "UMLS:C0026862", "UMLS:C0036894", "UMLS:C0085309", "UMLS:C0085310", "UMLS:C0085312", "UMLS:C0085352", "UMLS:C0085357", "UMLS:C0282057", "UMLS:C0338735", "UMLS:C0338736", "UMLS:C0338737", "UMLS:C0340739", "UMLS:C0341971", "UMLS:C0341977", "UMLS:C0349779", "UMLS:C0364853", "UMLS:C0393137", "UMLS:C0396668", "UMLS:C0396686", "UMLS:C0396711", "UMLS:C0396727", "UMLS:C0398105", "UMLS:C0398107", "UMLS:C0400589", "UMLS:C0400639", "UMLS:C0404708", "UMLS:C0404760", "UMLS:C0404761", "UMLS:C0404762", "UMLS:C0405248", "UMLS:C0405284", "UMLS:C0405863", "UMLS:C0406155", "UMLS:C0407201", "UMLS:C0408212", "UMLS:C0410062", "UMLS:C0410063", "UMLS:C0410064", "UMLS:C0410065", "UMLS:C0410066", "UMLS:C0410069", "UMLS:C0412506", "UMLS:C0416853", "UMLS:C0417691", "UMLS:C0417698", "UMLS:C0417708", "UMLS:C0418286", "UMLS:C0418289", "UMLS:C0425056", "UMLS:C0425057", "UMLS:C0425059", "UMLS:C0427436", "UMLS:C0427439", "UMLS:C0427442", "UMLS:C0427445", "UMLS:C0427456", "UMLS:C0427459", "UMLS:C0427463", "UMLS:C0427466", "UMLS:C0427687", "UMLS:C0427690", "UMLS:C0428453", "UMLS:C0428454", "UMLS:C0428455", "UMLS:C0428456", "UMLS:C0428464", "UMLS:C0428470", "UMLS:C0429993", "UMLS:C0430008", "UMLS:C0430009", "UMLS:C0430023", "UMLS:C0431685", "UMLS:C0434443", "UMLS:C0434444", "UMLS:C0434446", "UMLS:C0434447", "UMLS:C0434448", "UMLS:C0450146", "UMLS:C0452190", "UMLS:C0455501", "UMLS:C0474527", "UMLS:C0474528", "UMLS:C0474530", "UMLS:C0474532", "UMLS:C0474547", "UMLS:C0474551", "UMLS:C0474557", "UMLS:C0474560", "UMLS:C0546829", "UMLS:C0553893", "UMLS:C0553963", "UMLS:C0553964", "UMLS:C0554073", "UMLS:C0554079", "UMLS:C0554750", "UMLS:C0564722", "UMLS:C0565103", "UMLS:C0567398", "UMLS:C0579259", "UMLS:C0580452", "UMLS:C0580464", "UMLS:C0580605", "UMLS:C0580606", "UMLS:C0581862", "UMLS:C0588216", "UMLS:C0589505", "UMLS:C0628635", "UMLS:C0729483", "UMLS:C0729484", "UMLS:C0729745", "UMLS:C0729746", "UMLS:C0729748", "UMLS:C0949777", "UMLS:C0949778", "UMLS:C0949781", "UMLS:C0949821", "UMLS:C0949835", "UMLS:C0949906", "UMLS:C0976558", "UMLS:C1100246", "UMLS:C1136351", "UMLS:C1257908", "UMLS:C1263907", "UMLS:C1266869", "UMLS:C1266923", "UMLS:C1272182", "UMLS:C1273800", "UMLS:C1274566", "UMLS:C1274567", "UMLS:C1275051", "UMLS:C1275739", "UMLS:C1276048", "UMLS:C1277105", "UMLS:C1282387", "UMLS:C1282389", "UMLS:C1282390", "UMLS:C1282391", "UMLS:C1282392", "UMLS:C1282393", "UMLS:C1282394", "UMLS:C1282395", "UMLS:C1282396", "UMLS:C1282397", "UMLS:C1282398", "UMLS:C1282399", "UMLS:C1282400", "UMLS:C1282401", "UMLS:C1282404", "UMLS:C1282498", "UMLS:C1282499", "UMLS:C1282501", "UMLS:C1282503", "UMLS:C1282504", "UMLS:C1282505", "UMLS:C1282506", "UMLS:C1283242", "UMLS:C1284347", "UMLS:C1285169", "UMLS:C1285179", "UMLS:C1285244", "UMLS:C1285246", "UMLS:C1285361", "UMLS:C1286624", "UMLS:C1287497", "UMLS:C1287570", "UMLS:C1287757", "UMLS:C1287758", "UMLS:C1287851", "UMLS:C1288168", "UMLS:C1288351", "UMLS:C1292810", "UMLS:C1293689", "UMLS:C1293711", "UMLS:C1294839", "UMLS:C1294840", "UMLS:C1294841", "UMLS:C1294842", "UMLS:C1295592", "UMLS:C1295593", "UMLS:C1298728", "UMLS:C1299364", "UMLS:C1299541", "UMLS:C1299595", "UMLS:C1299778", "UMLS:C1301768", "UMLS:C1304057", "UMLS:C1304430", "UMLS:C1319328", "UMLS:C1320708", "UMLS:C1445696", "UMLS:C1445729", "UMLS:C1445732", "UMLS:C1532375", "UMLS:C1532376", "UMLS:C1536318", "UMLS:C1562846", "UMLS:C1827801", "UMLS:C1828328", "UMLS:C1959694", "UMLS:C1959808", "UMLS:C1960009", "UMLS:C1960235", "UMLS:C1960242", "UMLS:C1960246", "UMLS:C1960394", "UMLS:C1997044", "UMLS:C1997803", "UMLS:C1998048", "UMLS:C2314935", "UMLS:C2315501", "UMLS:C2317052", "UMLS:C2348859", "UMLS:C2584357", "UMLS:C2584358", "UMLS:C2584359", "UMLS:C2584360", "UMLS:C2584361", "UMLS:C2584362", "UMLS:C2584363", "UMLS:C2584364", "UMLS:C2584365", "UMLS:C2584366", "UMLS:C2584367", "UMLS:C2584368", "UMLS:C2584374", "UMLS:C2584375", "UMLS:C2584381", "UMLS:C2584382", "UMLS:C2584383", "UMLS:C2584384", "UMLS:C2584385", "UMLS:C2585119", "UMLS:C2585374", "UMLS:C2585448", "UMLS:C2585767", "UMLS:C2711003", "UMLS:C2711023", "UMLS:C2711073", "UMLS:C2711102", "UMLS:C2711128", "UMLS:C2711221", "UMLS:C2711275", "UMLS:C2711276", "UMLS:C2711332", "UMLS:C2711333", "UMLS:C2711362", "UMLS:C2711364", "UMLS:C2711367", "UMLS:C2711372", "UMLS:C2711409", "UMLS:C2711439", "UMLS:C2711498", "UMLS:C2711522", "UMLS:C2711532", "UMLS:C2711533", "UMLS:C2711535", "UMLS:C2711559", "UMLS:C2711565", "UMLS:C2711679", "UMLS:C2711680", "UMLS:C2711697", "UMLS:C2711702", "UMLS:C2711703", "UMLS:C2711711", "UMLS:C2711717", "UMLS:C2711762", "UMLS:C2711763", "UMLS:C2711816", "UMLS:C2711833", "UMLS:C2711923", "UMLS:C2711931", "UMLS:C2713415", "UMLS:C2725359", "UMLS:C2732256", "UMLS:C2732271", "UMLS:C2732307", "UMLS:C2732389", "UMLS:C2732390", "UMLS:C2732431", "UMLS:C2732516", "UMLS:C2732517", "UMLS:C2732565", "UMLS:C2732646", "UMLS:C2732698", "UMLS:C2732711", "UMLS:C2732713", "UMLS:C2732833", "UMLS:C2732834", "UMLS:C2732869", "UMLS:C2733032", "UMLS:C2733123", "UMLS:C2733124", "UMLS:C2733153", "UMLS:C2733190", "UMLS:C2733255", "UMLS:C2733256", "UMLS:C2733257", "UMLS:C2733351", "UMLS:C2733380", "UMLS:C2733544", "UMLS:C2733545", "UMLS:C2733583", "UMLS:C2733584", "UMLS:C2733606", "UMLS:C2733610", "UMLS:C2919602", "UMLS:C2960543", "UMLS:C3163703", "UMLS:C3266097", "UMLS:C3472375", "UMLS:C3472688", "UMLS:C3494254", "UMLS:C3645374", "UMLS:C3662031", "UMLS:C3692150", "UMLS:C3697463", "UMLS:C3838816", "UMLS:C3839023", "UMLS:C3839083", "UMLS:C3839094", "UMLS:C3839114", "UMLS:C3839134", "UMLS:C3840019", "UMLS:C3853925", "UMLS:C3853926", "UMLS:C3877704", "UMLS:C3877750", "UMLS:C3878151", "UMLS:C3880449", "UMLS:C3882032", "UMLS:C4039040", "UMLS:C4040292", "UMLS:C4040674", "UMLS:C4041119", "UMLS:C4075985", "UMLS:C4076248", "UMLS:C4076285", "UMLS:C4082162", "UMLS:C4082319", "UMLS:C4082410", "UMLS:C4082599", "UMLS:C4082610", "UMLS:C4300507", "UMLS:C4300508", "UMLS:C4300509", "UMLS:C4300516", "UMLS:C4300517", "UMLS:C4300520", "UMLS:C4300523", "UMLS:C4300529", "UMLS:C4300536", "UMLS:C4300538", "UMLS:C4300541", "UMLS:C4300542", "UMLS:C4300544", "UMLS:C4300545", "UMLS:C4300553", "UMLS:C4300556", "UMLS:C4300557", "UMLS:C4300565", "UMLS:C4300578", "UMLS:C4300579", "UMLS:C4300580", "UMLS:C4300581", "UMLS:C4300582", "UMLS:C4300583", "UMLS:C4300584", "UMLS:C4300586", "UMLS:C4300587", "UMLS:C4300589", "UMLS:C4300590", "UMLS:C4300591", "UMLS:C4300592", "UMLS:C4300593", "UMLS:C4300598", "UMLS:C4300601", "UMLS:C4300604", "UMLS:C4300605", "UMLS:C4300607", "UMLS:C4300613", "UMLS:C4300614", "UMLS:C4300622", "UMLS:C4300624", "UMLS:C4300626", "UMLS:C4300628", "UMLS:C4300629", "UMLS:C4300630", "UMLS:C4300631", "UMLS:C4300632", "UMLS:C4300634", "UMLS:C4300637", "UMLS:C4300638", "UMLS:C4300641", "UMLS:C4300642", "UMLS:C4300644", "UMLS:C4300646", "UMLS:C4300648", "UMLS:C4300649", "UMLS:C4300650", "UMLS:C4300652", "UMLS:C4300653", "UMLS:C4300654", "UMLS:C4300655", "UMLS:C4300656", "UMLS:C4300657", "UMLS:C4300658", "UMLS:C4300660", "UMLS:C4300661", "UMLS:C4300662", "UMLS:C4300664", "UMLS:C4300665", "UMLS:C4300669", "UMLS:C4300671", "UMLS:C4300672", "UMLS:C4300677", "UMLS:C4300678", "UMLS:C4300679", "UMLS:C4300680", "UMLS:C4300681", "UMLS:C4300684", "UMLS:C4300687", "UMLS:C4300688", "UMLS:C4300689", "UMLS:C4300690", "UMLS:C4300691", "UMLS:C4300692", "UMLS:C4300694", "UMLS:C4300695", "UMLS:C4300697", "UMLS:C4300698", "UMLS:C4300700", "UMLS:C4300701", "UMLS:C4300702", "UMLS:C4300704", "UMLS:C4300707", "UMLS:C4300710", "UMLS:C4300716", "UMLS:C4300719", "UMLS:C4300721", "UMLS:C4300723", "UMLS:C4300725", "UMLS:C4300726", "UMLS:C4300727", "UMLS:C4300728", "UMLS:C4300729", "UMLS:C4300731", "UMLS:C4300736", "UMLS:C4300737", "UMLS:C4300738", "UMLS:C4300742", "UMLS:C4300743", "UMLS:C4300744", "UMLS:C4300749", "UMLS:C4300766", "UMLS:C4300781", "UMLS:C4302181", "UMLS:C4302299", "UMLS:C4304144", "UMLS:C4304633", "UMLS:C4510687", "UMLS:C4511000", "UMLS:C4512588", "UMLS:C4513440", "UMLS:C4513441", "UMLS:C4513442", "UMLS:C4513443", "UMLS:C4513444", "UMLS:C4513458", "UMLS:C4516980", "UMLS:C4517995", "UMLS:C4518365", "UMLS:C4518366", "UMLS:C4518390", "UMLS:C4538230", "UMLS:C4538231", "UMLS:C4538232", "UMLS:C4538235", "UMLS:C4538236", "UMLS:C4538237", "UMLS:C4538246", "UMLS:C4538253", "UMLS:C4538254", "UMLS:C4538255", "UMLS:C4538257", "UMLS:C4538259", "UMLS:C4538261", "UMLS:C4538262", "UMLS:C4538265", "UMLS:C4538267", "UMLS:C4538268", "UMLS:C4538270", "UMLS:C4538271", "UMLS:C4538272", "UMLS:C4538274", "UMLS:C4538276", "UMLS:C4538277", "UMLS:C4538280", "UMLS:C4538281", "UMLS:C4538283", "UMLS:C4538293", "UMLS:C4538297", "UMLS:C4538299", "UMLS:C4538308", "UMLS:C4538333", "UMLS:C4538336", "UMLS:C4538337", "UMLS:C4543626", "UMLS:C4545500", "UMLS:C4551801", "UMLS:C4708017", "UMLS:C4708591", "UMLS:C4741901", "UMLS:C4741902", "UMLS:C4741909", "UMLS:C4741911", "UMLS:C4741912", "UMLS:C4741913", "UMLS:C4741918", "UMLS:C4741920", "UMLS:C4741927", "UMLS:C4741933", "UMLS:C4741937", "UMLS:C4741938", "UMLS:C4741941", "UMLS:C4741944", "UMLS:C4741946", "UMLS:C4741947", "UMLS:C4741948", "UMLS:C4741949", "UMLS:C4741951", "UMLS:C4741952", "UMLS:C4741963", "UMLS:C4741969", "UMLS:C4741970", "UMLS:C4741974", "UMLS:C4741975", "UMLS:C4741976", "UMLS:C4741977", "UMLS:C4741979", "UMLS:C4741980", "UMLS:C4741981", "UMLS:C4741982", "UMLS:C4742001", "UMLS:C4742003", "UMLS:C4742011", "UMLS:C4742019", "UMLS:C4742024", "UMLS:C4742044", "UMLS:C4742045", "UMLS:C4742046", "UMLS:C4750592", "UMLS:C4750885", "UMLS:C4751268", "UMLS:C4751466", "UMLS:C4751743", "UMLS:C4751768", "UMLS:C4752314", "UMLS:C5191373", "UMLS:C5191404", "UMLS:C5191468", "UMLS:C5191469", "UMLS:C5191515", "UMLS:C5191516", "UMLS:C5191946", "UMLS:C5191947", "UMLS:C5192179", "UMLS:C5192190", "UMLS:C5192236", "UMLS:C5192277", "UMLS:C5192280", "UMLS:C5192292", "UMLS:C5192405", "UMLS:C5192407", "UMLS:C5192859", "UMLS:C5192860", "UMLS:C5192932", "UMLS:C5197871", "UMLS:C5231257", "UMLS:C5231258", "UMLS:C5231259", "UMLS:C5231310", "UMLS:C5234714", "UMLS:C5234715", "UMLS:C5234716", "UMLS:C5234718", "UMLS:C5234719", "UMLS:C5234720", "UMLS:C5234723", "UMLS:C5234724", "UMLS:C5234725", "UMLS:C5234727", "UMLS:C5234728", "UMLS:C5234731", "UMLS:C5234732", "UMLS:C5234735", "UMLS:C5234737", "UMLS:C5234738", "UMLS:C5234739", "UMLS:C5234740", "UMLS:C5234741", "UMLS:C5234742", "UMLS:C5234743", "UMLS:C5234745", "UMLS:C5234746", "UMLS:C5234748", "UMLS:C5234751", "UMLS:C5234753", "UMLS:C5234755", "UMLS:C5234756", "UMLS:C5234763", "UMLS:C5234766", "UMLS:C5234768", "UMLS:C5234769", "UMLS:C5234770", "UMLS:C5234771", "UMLS:C5234772", "UMLS:C5234774", "UMLS:C5234776", "UMLS:C5234777", "UMLS:C5234778", "UMLS:C5234780", "UMLS:C5234782", "UMLS:C5234783", "UMLS:C5234784", "UMLS:C5234785", "UMLS:C5234788", "UMLS:C5234789", "UMLS:C5234790", "UMLS:C5234791", "UMLS:C5234792", "UMLS:C5234793", "UMLS:C5234794", "UMLS:C5234795", "UMLS:C5234796", "UMLS:C5234797", "UMLS:C5234800", "UMLS:C5234801", "UMLS:C5234802", "UMLS:C5234803", "UMLS:C5234805", "UMLS:C5234806", "UMLS:C5234807", "UMLS:C5234808", "UMLS:C5234809", "UMLS:C5234810", "UMLS:C5234811", "UMLS:C5234812", "UMLS:C5234813", "UMLS:C5234814", "UMLS:C5234815", "UMLS:C5234816", "UMLS:C5234817", "UMLS:C5234819", "UMLS:C5234820", "UMLS:C5234821", "UMLS:C5234822", "UMLS:C5234823", "UMLS:C5234825", "UMLS:C5234826", "UMLS:C5234827", "UMLS:C5234828", "UMLS:C5234829", "UMLS:C5234830", "UMLS:C5234831", "UMLS:C5234832", "UMLS:C5234833", "UMLS:C5234834", "UMLS:C5234835", "UMLS:C5234836", "UMLS:C5234838", "UMLS:C5234839", "UMLS:C5234840", "UMLS:C5234841", "UMLS:C5234842", "UMLS:C5234843", "UMLS:C5234844", "UMLS:C5243531", "UMLS:C5395021", "UMLS:C5395486", "UMLS:C5437674", "UMLS:C5437676", "UMLS:C5437678", "UMLS:C5442159", "UMLS:C5543750", "UMLS:C5543753", "UMLS:C5543756", "UMLS:C5543757", "UMLS:C5543758", "UMLS:C5543759", "UMLS:C5543760", "UMLS:C5543761", "UMLS:C5543762", "UMLS:C5543763", "UMLS:C5547044", "UMLS:C5567857", "UMLS:C5568102", "UMLS:C5568984", "UMLS:C5574411", "UMLS:C5666705", "UMLS:C5681856", "UMLS:C5681858", "UMLS:C5681859", "UMLS:C5681860", "UMLS:C5681861", "UMLS:C5686317", "UMLS:C5686821", "UMLS:C5686822", "UMLS:C5687956", "UMLS:C5761736", "UMLS:C5779425", "UMLS:C5779426", "UMLS:C5779427", "UMLS:C5779428", "UMLS:C5779429", "UMLS:C5779430", "UMLS:C5779431", "UMLS:C5779432", "UMLS:C5779433", "UMLS:C5779434", "UMLS:C5779435", "UMLS:C5779436", "UMLS:C5779437", "UMLS:C5779438", "UMLS:C5779439", "UMLS:C5779440", "UMLS:C5779441", "UMLS:C5779442", "UMLS:C5779443", "UMLS:C5779444", "UMLS:C5779445", "UMLS:C5779446", "UMLS:C5779447", "UMLS:C5779448", "UMLS:C5779449", "UMLS:C5779450", "UMLS:C5779451", "UMLS:C5779452", "UMLS:C5779453", "UMLS:C5779454", "UMLS:C5779455", "UMLS:C5779456", "UMLS:C5779457", "UMLS:C5779458", "UMLS:C5779459", "UMLS:C5779460", "UMLS:C5779461", "UMLS:C5779462", "UMLS:C5779463", "UMLS:C5779464", "UMLS:C5779465", "UMLS:C5779466", "UMLS:C5779467", "UMLS:C5779468", "UMLS:C5779469", "UMLS:C5779470", "UMLS:C5779471", "UMLS:C5779472", "UMLS:C5779473", "UMLS:C5779474", "UMLS:C5779475", "UMLS:C5779476", "UMLS:C5779477", "UMLS:C5779478", "UMLS:C5779479", "UMLS:C5779480"]

on KG2.8.3pre and found that 34 of the over 700 in that list are in KG2, many of them nameless. For this reason, I opted to have these nodes ignored in this new structure. However, this can be changed at any time.

ecwood commented 1 year ago

Sometimes, the current model conflates nodes that shouldn't be conflated. While they aren't necessarily assigned biolink:close_match, they do absorb each other's definition. Ex.

<http://purl.bioontology.org/ontology/MSH/D010742> a owl:Class ;
        skos:prefLabel """Phospholipid Ethers"""@en ;
        skos:notation """D010742"""^^xsd:string ;
        skos:altLabel """1 Alkyl 2 Acylphosphatidates"""@en , """1-Alkyl-2-Acylphosphatidates"""@en , """Ether Phosphatidates"""@en , """Ether Phospholipids"""@en , """Ethers, Glycerol Phosphate"""@en , """Ethers, Glycerophosphate"""@en , """Ethers, Phospholipid"""@en , """Glycerol Phosphate Ethers"""@en , """Glycerophosphate Ethers"""@en , """Phosphate Ethers, Glycerol"""@en , """Phosphatidates, Ether"""@en , """Phospholipids, Ether"""@en ;
        skos:definition """Phospholipids which have an alcohol moiety in ethereal linkage with a saturated or unsaturated aliphatic alcohol. They are usually derivatives of phosphoglycerols or phosphatidates. The other two alcohol groups of the glycerol backbone are usually in ester linkage. These compounds are widely distributed in animal tissues."""@en ;
        rdfs:subClassOf <http://purl.bioontology.org/ontology/MSH/D005995> ;
        rdfs:subClassOf <http://purl.bioontology.org/ontology/MSH/D020404> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000008> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000009> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000032> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000037> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000097> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000134> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000138> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000145> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000191> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000266> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000276> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000302> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000378> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000493> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000494> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000506> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000528> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000592> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000600> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000627> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000633> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000652> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000737> ;
        <http://purl.bioontology.org/ontology/MSH/QB> <http://purl.bioontology.org/ontology/MSH/Q000819> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C000592577> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C000599353> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C008877> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C026659> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C026792> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C028144> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C029405> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C029429> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C029430> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C032521> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C037913> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C043051> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C043080> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C043297> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C046258> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C050186> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C050202> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C052176> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C052177> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C052298> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C053479> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C055044> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C055096> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C055135> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C055271> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C055462> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C056282> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C057178> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C060919> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C061648> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C063292> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C063385> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C063386> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C063387> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C063388> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C063783> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C063784> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C068367> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C068368> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C069064> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C069963> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C072627> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C074029> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C076164> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C076779> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C076781> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C076783> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C077297> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C077298> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C077315> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C077316> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C082824> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C085204> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C087112> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C091046> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C091569> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C091570> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C091903> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C092456> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C092457> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C096487> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C109624> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C110076> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C119381> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C400184> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C400185> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C419805> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C454821> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C480343> ;
        <http://purl.bioontology.org/ontology/MSH/mapped_from> <http://purl.bioontology.org/ontology/MSH/C484862> ;
        <http://purl.bioontology.org/ontology/MSH/TH> """NLM (1988)"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031735"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TH> """UNK (19XX)"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031736"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TH> """UNK (19XX)"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031733"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TH> """UNK (19XX)"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TH> """UNK (19XX)"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031734"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031738"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TH> """UNK (19XX)"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/AQL> """AD AE AG AI AN BL CF CH CL CS EC HI IM IP ME PD PK PO RE SD ST TO TU UR"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/MMR> """20170410"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/MN> """D02.033.800.875.875.750"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/DC> """1"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/DX> """19880101"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/HN> """88"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/MDA> """19870325"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/MN> """D02.355.460.750"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/MN> """D10.570.755.375.760.400.985"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/PM> """88"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031733"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031735"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031734"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031737"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031737"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031738"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031738"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/TERMUI> """T031736"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/RN> """0"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MSH/RN> """0"""^^xsd:string ;
        UMLS:has_cui """C0000074"""^^xsd:string ;
        UMLS:has_cui """C0031673"""^^xsd:string ;
        UMLS:has_tui """T109"""^^xsd:string ;
        UMLS:has_sty <http://purl.bioontology.org/ontology/STY/T109> ;

In UMLS, the relationship is actually a UMLS:RB relationship:

select * from MRREL where CUI1="C0000074";
CUI1 AUI1 STYPE1 REL CUI2 AUI2 STYPE2 RELA RUI SRUI SAB SL RG DIR SUPPRESS CVF
C0000074 A1317486 AUI SY C0000074 A26606894 AUI has_permuted_term R162687752 NULL MSH MSH NULL NULL N NULL
C0000074 A26606894 AUI SY C0000074 A1317486 AUI permuted_term_of R162702781 NULL MSH MSH NULL NULL N NULL
C0000074 A35421785 SCUI RB C0031673 A13050223 SCUI NULL R216697970 NULL MSHCZE MSHCZE NULL NULL N NULL
C0000074 A26606894 SCUI RB C0031673 A0100987 SCUI NULL R31978222 NULL MSH MSH NULL NULL N NULL
C0000074 A20001772 SCUI RB C0031673 A20007425 SCUI NULL R86028816 NULL MSHFRE MSHFRE NULL NULL N NULL
ecwood commented 1 year ago

I am not sure what to do about this:

{"('U000006', 'CCPSS')": {"cuis": ["C0740670"], "names": ["Y|UNKNOWN/MISC PROBLEM"]}}
{"('U000006', 'COSTAR')": {"cuis": ["C0004766", "C0149612"], "names": ["Y|ABNORMAL STRESS TEST", "Y|BARTHOLIN'S GLAND ABSCESS"]}}
{"('U000006', 'DXP')": {"cuis": ["C0000786", "C1879311"], "names": ["Y|ABDOMINAL FISTULA", "Y|ABORTION, SPONTANEOUS"]}}
{"('U000006', 'ICD10AM')": {"cuis": ["C0869538"], "names": ["Y|Skull, Meninges & Brain: Application, Insertion, Removal"]}}
{"('U000006', 'LCH')": {"cuis": ["C0000817"], "names": ["N|Abortion, Septic"]}}
{"('U000006', 'MSH')": {"cuis": ["C0012674"], "names": ["Y|Diseases (MeSH Category)"]}}
{"('U000006', 'MTH')": {"cuis": ["C2237045"], "names": ["Y|Chem 7"]}}
{"('U000006', 'PCDS')": {"cuis": ["C0542573"], "names": ["N|Assessment: Bowel Elimination", "Y|Assessment"]}}
{"('U000006', 'RAM')": {"cuis": ["C0694566"], "names": ["Y|systemic mai"]}}
{"('U000006', 'SNM')": {"cuis": ["C0022408"], "names": ["Y|Diseases of Joints"]}}

although, in KG2.8.3pre:

match (n) where n.id contains "U000006" return n.id, n.name, n.provided_by, n.category
n.id n.name n.provided_by n.category
"LOINC:MTHU000006" "Response to antigens" "['infores:loinc-umls']" "biolink:PhysiologicalProcess"
"MESH:U000006" "Diseases (MeSH Category)" "['infores:mesh']" "biolink:Disease"
"OMIM:MTHU000006" "Diffuse mesangial sclerosis" "['infores:omim']" "biolink:NamedThing"
"PSY:MTHU000006" "Occupational & Employment (PsycINFO Cluster Term)" "['infores:psy-umls']" "biolink:InformationContentEntity"

The LOINC:MTHU000006 node is from this one:

{"('MTHU000006', 'LNC')": {"cuis": ["C1314973"], "names": ["N|Response to antigens", "Y|ALLERGY"]}}

The OMIM:MTHU000006 node is from this one:

{"('MTHU000006', 'OMIM')": {"cuis": ["C0268747"], "names": ["N|Diffuse mesangial sclerosis"]}}

The PSY:MTHU000006 node is from this one:

{"('MTHU000006', 'PSY')": {"cuis": ["C0935479"], "names": ["Y|Occupational & Employment (PsycINFO Cluster Term)"]}}
ecwood commented 1 year ago

Here are all of the sets of TUIs that make up the DRUGBANK UMLS nodes:

[
    "['T121', 'T125', 'T127']",
    "['T109', 'T120', 'T121', 'T130']",
    "['T121', 'T131', 'T197']",
    "['T114', 'T116', 'T121']",
    "['T109', 'T131']",
    "['T121', 'T130', 'T197']",
    "['T121', 'T123', 'T197']",
    "['T116', 'T121', 'T123', 'T125']",
    "['T024']",
    "['T114', 'T129']",
    "['T123']",
    "['T116', 'T121', 'T192']",
    "['T109', 'T121', 'T129', 'T168']",
    "['T123', 'T197']",
    "['T109', 'T121', 'T122']",
    "['T109']",
    "['T103']",
    "['T130']",
    "['T121', 'T127']",
    "['T121', 'T129']",
    "['T109', 'T121', 'T123', 'T196']",
    "['T123', 'T196']",
    "['T109', 'T123', 'T195']",
    "['T109', 'T116', 'T121', 'T129', 'T130']",
    "['T116', 'T129', 'T130']",
    "['T116', 'T121']",
    "['T031']",
    "['T131', 'T197']",
    "['T122', 'T197']",
    "['T109', 'T130', 'T196']",
    "['T116', 'T125', 'T130']",
    "['T005', 'T121']",
    "['T116', 'T123']",
    "['T109', 'T116', 'T121', 'T127']",
    "['T109', 'T116', 'T121', 'T129']",
    "['T116', 'T123', 'T130']",
    "['T197']",
    "['T109', 'T116', 'T121', 'T126']",
    "['T116', 'T121', 'T129', 'T131']",
    "['T109', 'T121', 'T123', 'T168']",
    "['T109', 'T121', 'T129', 'T130', 'T131']",
    "['T007', 'T121']",
    "['T109', 'T116', 'T121', 'T130']",
    "['T109', 'T116', 'T121', 'T125']",
    "['T109', 'T116', 'T123']",
    "['T109', 'T122']",
    "['T109', 'T121', 'T125']",
    "['T121', 'T131']",
    "['T109', 'T168']",
    "['T109', 'T121', 'T122', 'T123']",
    "['T114', 'T116', 'T121', 'T129']",
    "['T131', 'T196']",
    "['T168']",
    "['T025', 'T121']",
    "['T004']",
    "['T109', 'T123', 'T131']",
    "['T114', 'T130']",
    "['T109', 'T195']",
    "['T109', 'T121', 'T125', 'T130']",
    "['T109', 'T129']",
    "['T002', 'T121', 'T129', 'T130']",
    "['T109', 'T121', 'T130', 'T131']",
    "['T116', 'T121', 'T130']",
    "['T109', 'T121', 'T196', 'T197']",
    "['T002', 'T109', 'T121', 'T168']",
    "['T129']",
    "['T116', 'T129']",
    "['T121', 'T167', 'T197']",
    "['T116', 'T121', 'T123', 'T126']",
    "['T114', 'T121', 'T195']",
    "['T121', 'T129', 'T168']",
    "['T121']",
    "['T109', 'T121', 'T127', 'T130']",
    "['T114', 'T121', 'T127']",
    "['T130', 'T196', 'T197']",
    "['T195']",
    "['T109', 'T114', 'T121', 'T123']",
    "['T116', 'T121', 'T125', 'T129']",
    "['T116', 'T121', 'T126']",
    "['T121', 'T125']",
    "['T116', 'T121', 'T131']",
    "['T116', 'T130']",
    "['T114', 'T121', 'T129']",
    "['T116', 'T121', 'T195']",
    "['T114', 'T116', 'T195']",
    "['T025', 'T109', 'T121']",
    "['T121', 'T123', 'T196']",
    "['T116', 'T121', 'T129']",
    "['T109', 'T130', 'T195']",
    "['T121', 'T197']",
    "['T121', 'T126']",
    "['T004', 'T121', 'T129', 'T168']",
    "['T121', 'T130', 'T196']",
    "['T116', 'T121', 'T125']",
    "['T109', 'T123']",
    "['T114', 'T123']",
    "['T109', 'T121', 'T129', 'T131']",
    "['T109', 'T116', 'T129', 'T130']",
    "['T109', 'T114', 'T121']",
    "['T116', 'T126']",
    "['T114', 'T116', 'T126']",
    "['T121', 'T130']",
    "['T109', 'T121', 'T130']",
    "['T121', 'T123', 'T196', 'T197']",
    "['T109', 'T121', 'T123', 'T127']",
    "['T116', 'T121', 'T123', 'T129']",
    "['T109', 'T121']",
    "['T116', 'T125']",
    "['T116', 'T123', 'T131']",
    "['T121', 'T123']",
    "['T005']",
    "['T025']",
    "['T109', 'T127']",
    "['T109', 'T116', 'T121', 'T129', 'T192']",
    "['T025', 'T121', 'T129']",
    "['T109', 'T130']",
    "['T116']",
    "['T121', 'T122', 'T197']",
    "['T122']",
    "['T109', 'T121', 'T123']",
    "['T116', 'T121', 'T123', 'T131']",
    "['T109', 'T116', 'T121']",
    "['T109', 'T121', 'T130', 'T197']",
    "['T109', 'T121', 'T130', 'T196']",
    "['T109', 'T114', 'T123']",
    "['T114', 'T121', 'T123']",
    "['T109', 'T121', 'T127']",
    "['T109', 'T121', 'T129']",
    "['T109', 'T125']",
    "['T109', 'T196']",
    "['T130', 'T196']",
    "['T114']",
    "['T196']",
    "['T109', 'T131', 'T196']",
    "['T109', 'T127', 'T130']",
    "['T109', 'T121', 'T196']",
    "['T109', 'T123', 'T130']",
    "['T007']",
    "['T121', 'T129', 'T130']",
    "['T109', 'T130', 'T131']",
    "['T116', 'T195']",
    "['T109', 'T121', 'T123', 'T125']",
    "['T121', 'T129', 'T131']",
    "['T121', 'T196']",
    "['T024', 'T116', 'T123']",
    "['T109', 'T121', 'T195']",
    "['T109', 'T116', 'T121', 'T123']",
    "['T116', 'T121', 'T129', 'T130']",
    "['T129', 'T130']",
    "['T109', 'T130', 'T197']",
    "['T129', 'T192']",
    "['T034', 'T116', 'T121', 'T123']",
    "['T109', 'T121', 'T197']",
    "['T116', 'T121', 'T123']",
    "['T114', 'T121']",
    "['T121', 'T196', 'T197']",
    "['T130', 'T197']",
    "['T109', 'T120']",
    "['T109', 'T116', 'T121', 'T195']",
    "['T109', 'T121', 'T131']",
    "['T007', 'T121', 'T129']",
    "['T109', 'T121', 'T168']"
]

Understanding the TUI combinations in UMLS will make it easier to figure out how to categorize the UMLS nodes without the hierarchy we currently rely on (that is very challenging to work with).

ecwood commented 1 year ago
ubuntu@ip-172-31-50-116:~/kg2-build$ grep "has category inconsistency" umls_node_ids.log | wc -l
823502
ubuntu@ip-172-31-50-116:~/kg2-build$ grep "has name inconsistency" umls_node_ids.log | wc -l
172104

Given the current code, this is where inconsistencies with KG2.8.5pre occur. Here's the total discrepancy count:

Total Nodes: 4009349; Problem Nodes: 877796

162876 of these problem nodes are biolink:Drug to biolink:ChemicalEntity inconsistencies.

ecwood commented 1 year ago

Here's all of the problem node category pairings:

{
    "biolink:Activity---biolink:Agent": 488,
    "biolink:Activity---biolink:AnatomicalEntity": 1,
    "biolink:Activity---biolink:BiologicalEntity": 3,
    "biolink:Activity---biolink:BiologicalProcess": 2,
    "biolink:Activity---biolink:ClinicalIntervention": 179,
    "biolink:Activity---biolink:Device": 1,
    "biolink:Activity---biolink:InformationContentEntity": 26,
    "biolink:Activity---biolink:NamedThing": 168,
    "biolink:Activity---biolink:Phenomenon": 14,
    "biolink:Activity---biolink:PhenotypicFeature": 2,
    "biolink:Activity---biolink:Procedure": 1060,
    "biolink:Agent---biolink:Activity": 10,
    "biolink:Agent---biolink:ClinicalIntervention": 2,
    "biolink:Agent---biolink:Device": 1,
    "biolink:Agent---biolink:InformationContentEntity": 2,
    "biolink:Agent---biolink:NamedThing": 622,
    "biolink:Agent---biolink:OrganismTaxon": 1,
    "biolink:Agent---biolink:PhysicalEntity": 1,
    "biolink:Agent---biolink:Procedure": 6,
    "biolink:AnatomicalEntity---biolink:Agent": 5,
    "biolink:AnatomicalEntity---biolink:BiologicalEntity": 3,
    "biolink:AnatomicalEntity---biolink:ClinicalIntervention": 5,
    "biolink:AnatomicalEntity---biolink:Device": 4,
    "biolink:AnatomicalEntity---biolink:Drug": 1,
    "biolink:AnatomicalEntity---biolink:InformationContentEntity": 85,
    "biolink:AnatomicalEntity---biolink:MaterialSample": 142,
    "biolink:AnatomicalEntity---biolink:NamedThing": 65,
    "biolink:AnatomicalEntity---biolink:PhenotypicFeature": 23,
    "biolink:AnatomicalEntity---biolink:Procedure": 14,
    "biolink:Behavior---biolink:Activity": 407,
    "biolink:Behavior---biolink:Agent": 52,
    "biolink:Behavior---biolink:BiologicalEntity": 3,
    "biolink:Behavior---biolink:BiologicalProcess": 71,
    "biolink:Behavior---biolink:ClinicalIntervention": 3,
    "biolink:Behavior---biolink:Disease": 4,
    "biolink:Behavior---biolink:InformationContentEntity": 28,
    "biolink:Behavior---biolink:NamedThing": 119,
    "biolink:Behavior---biolink:Phenomenon": 127,
    "biolink:Behavior---biolink:PhenotypicFeature": 24,
    "biolink:Behavior---biolink:Procedure": 6,
    "biolink:Cell---biolink:AnatomicalEntity": 1753,
    "biolink:Cell---biolink:BiologicalEntity": 5,
    "biolink:Cell---biolink:Drug": 849,
    "biolink:Cell---biolink:InformationContentEntity": 15,
    "biolink:Cell---biolink:MaterialSample": 134,
    "biolink:Cell---biolink:NamedThing": 45,
    "biolink:Cell---biolink:PhenotypicFeature": 4,
    "biolink:CellularComponent---biolink:Activity": 1,
    "biolink:CellularComponent---biolink:AnatomicalEntity": 2555,
    "biolink:CellularComponent---biolink:BiologicalEntity": 21,
    "biolink:CellularComponent---biolink:BiologicalProcess": 2,
    "biolink:CellularComponent---biolink:InformationContentEntity": 5,
    "biolink:CellularComponent---biolink:MaterialSample": 5,
    "biolink:CellularComponent---biolink:MolecularEntity": 1,
    "biolink:CellularComponent---biolink:NamedThing": 16,
    "biolink:CellularComponent---biolink:PhenotypicFeature": 10,
    "biolink:CellularComponent---biolink:Protein": 20,
    "biolink:CellularComponent---biolink:ProteinDomain": 4,
    "biolink:ChemicalEntity---biolink:Activity": 3,
    "biolink:ChemicalEntity---biolink:Agent": 11,
    "biolink:ChemicalEntity---biolink:AnatomicalEntity": 28,
    "biolink:ChemicalEntity---biolink:BiologicalEntity": 108,
    "biolink:ChemicalEntity---biolink:ClinicalIntervention": 1,
    "biolink:ChemicalEntity---biolink:Device": 9,
    "biolink:ChemicalEntity---biolink:Drug": 5750,
    "biolink:ChemicalEntity---biolink:Gene": 1,
    "biolink:ChemicalEntity---biolink:InformationContentEntity": 8,
    "biolink:ChemicalEntity---biolink:MaterialSample": 65,
    "biolink:ChemicalEntity---biolink:NamedThing": 718,
    "biolink:ChemicalEntity---biolink:OrganismTaxon": 14,
    "biolink:ChemicalEntity---biolink:PhenotypicFeature": 18,
    "biolink:ChemicalEntity---biolink:Procedure": 100,
    "biolink:ChemicalEntity---biolink:Protein": 10,
    "biolink:Cohort---biolink:Activity": 12,
    "biolink:Cohort---biolink:Agent": 136,
    "biolink:Cohort---biolink:AnatomicalEntity": 1,
    "biolink:Cohort---biolink:BiologicalEntity": 1,
    "biolink:Cohort---biolink:ClinicalIntervention": 1,
    "biolink:Cohort---biolink:InformationContentEntity": 9,
    "biolink:Cohort---biolink:NamedThing": 52,
    "biolink:Cohort---biolink:PopulationOfIndividualOrganisms": 6,
    "biolink:Cohort---biolink:Procedure": 35,
    "biolink:Device---biolink:Activity": 1,
    "biolink:Device---biolink:Agent": 56,
    "biolink:Device---biolink:AnatomicalEntity": 3,
    "biolink:Device---biolink:ClinicalIntervention": 8,
    "biolink:Device---biolink:Drug": 1501,
    "biolink:Device---biolink:InformationContentEntity": 2,
    "biolink:Device---biolink:MaterialSample": 61,
    "biolink:Device---biolink:NamedThing": 2955,
    "biolink:Device---biolink:Phenomenon": 1,
    "biolink:Device---biolink:PhenotypicFeature": 1,
    "biolink:Device---biolink:Procedure": 2180,
    "biolink:Disease---biolink:Activity": 70,
    "biolink:Disease---biolink:Agent": 14,
    "biolink:Disease---biolink:AnatomicalEntity": 42,
    "biolink:Disease---biolink:BiologicalEntity": 169,
    "biolink:Disease---biolink:BiologicalProcess": 18,
    "biolink:Disease---biolink:Cell": 9,
    "biolink:Disease---biolink:ClinicalIntervention": 1,
    "biolink:Disease---biolink:Device": 1,
    "biolink:Disease---biolink:DiseaseOrPhenotypicFeature": 63,
    "biolink:Disease---biolink:InformationContentEntity": 29,
    "biolink:Disease---biolink:MaterialSample": 7,
    "biolink:Disease---biolink:NamedThing": 8606,
    "biolink:Disease---biolink:PathologicalProcess": 66,
    "biolink:Disease---biolink:Phenomenon": 17,
    "biolink:Disease---biolink:PhenotypicFeature": 13273,
    "biolink:Disease---biolink:Procedure": 28,
    "biolink:DiseaseOrPhenotypicFeature---biolink:Activity": 241,
    "biolink:DiseaseOrPhenotypicFeature---biolink:Agent": 63,
    "biolink:DiseaseOrPhenotypicFeature---biolink:AnatomicalEntity": 44,
    "biolink:DiseaseOrPhenotypicFeature---biolink:BiologicalEntity": 10,
    "biolink:DiseaseOrPhenotypicFeature---biolink:BiologicalProcess": 15,
    "biolink:DiseaseOrPhenotypicFeature---biolink:Cell": 8,
    "biolink:DiseaseOrPhenotypicFeature---biolink:ClinicalIntervention": 641,
    "biolink:DiseaseOrPhenotypicFeature---biolink:Disease": 154,
    "biolink:DiseaseOrPhenotypicFeature---biolink:Drug": 1,
    "biolink:DiseaseOrPhenotypicFeature---biolink:InformationContentEntity": 2459,
    "biolink:DiseaseOrPhenotypicFeature---biolink:NamedThing": 35529,
    "biolink:DiseaseOrPhenotypicFeature---biolink:OrganismTaxon": 1,
    "biolink:DiseaseOrPhenotypicFeature---biolink:PathologicalProcess": 4,
    "biolink:DiseaseOrPhenotypicFeature---biolink:Phenomenon": 17,
    "biolink:DiseaseOrPhenotypicFeature---biolink:PhenotypicFeature": 24574,
    "biolink:DiseaseOrPhenotypicFeature---biolink:Procedure": 804,
    "biolink:Drug---biolink:Activity": 1,
    "biolink:Drug---biolink:AnatomicalEntity": 36,
    "biolink:Drug---biolink:BiologicalEntity": 97,
    "biolink:Drug---biolink:ChemicalEntity": 187984,
    "biolink:Drug---biolink:ChemicalMixture": 63,
    "biolink:Drug---biolink:ClinicalIntervention": 7,
    "biolink:Drug---biolink:Device": 1,
    "biolink:Drug---biolink:Disease": 1,
    "biolink:Drug---biolink:Gene": 2,
    "biolink:Drug---biolink:InformationContentEntity": 6,
    "biolink:Drug---biolink:MaterialSample": 13,
    "biolink:Drug---biolink:MolecularEntity": 10,
    "biolink:Drug---biolink:NamedThing": 34956,
    "biolink:Drug---biolink:NoncodingRNAProduct": 1,
    "biolink:Drug---biolink:OrganismTaxon": 68,
    "biolink:Drug---biolink:PhenotypicFeature": 2,
    "biolink:Drug---biolink:Procedure": 1097,
    "biolink:Drug---biolink:Protein": 84,
    "biolink:Drug---biolink:ProteinFamily": 2,
    "biolink:Drug---biolink:Treatment": 14,
    "biolink:Event---biolink:Activity": 4,
    "biolink:Event---biolink:Agent": 1,
    "biolink:Event---biolink:InformationContentEntity": 39,
    "biolink:Event---biolink:PhenotypicFeature": 1,
    "biolink:Food---biolink:BiologicalEntity": 93,
    "biolink:Food---biolink:ChemicalEntity": 123,
    "biolink:Food---biolink:ClinicalIntervention": 3,
    "biolink:Food---biolink:Drug": 116,
    "biolink:Food---biolink:InformationContentEntity": 64,
    "biolink:Food---biolink:NamedThing": 161,
    "biolink:Food---biolink:OrganismTaxon": 2,
    "biolink:Food---biolink:Procedure": 34,
    "biolink:GeographicLocation---biolink:Agent": 4,
    "biolink:GeographicLocation---biolink:BiologicalEntity": 1,
    "biolink:GeographicLocation---biolink:InformationContentEntity": 5,
    "biolink:GeographicLocation---biolink:NamedThing": 5,
    "biolink:GrossAnatomicalStructure---biolink:Agent": 1,
    "biolink:GrossAnatomicalStructure---biolink:AnatomicalEntity": 75418,
    "biolink:GrossAnatomicalStructure---biolink:BiologicalEntity": 5,
    "biolink:GrossAnatomicalStructure---biolink:ClinicalIntervention": 1,
    "biolink:GrossAnatomicalStructure---biolink:Device": 4,
    "biolink:GrossAnatomicalStructure---biolink:Disease": 1,
    "biolink:GrossAnatomicalStructure---biolink:Drug": 5,
    "biolink:GrossAnatomicalStructure---biolink:Gene": 1,
    "biolink:GrossAnatomicalStructure---biolink:InformationContentEntity": 24,
    "biolink:GrossAnatomicalStructure---biolink:MaterialSample": 49,
    "biolink:GrossAnatomicalStructure---biolink:NamedThing": 112,
    "biolink:GrossAnatomicalStructure---biolink:OrganismTaxon": 4,
    "biolink:GrossAnatomicalStructure---biolink:PhenotypicFeature": 40,
    "biolink:GrossAnatomicalStructure---biolink:Procedure": 33,
    "biolink:IndividualOrganism---biolink:Activity": 11,
    "biolink:IndividualOrganism---biolink:Agent": 11,
    "biolink:IndividualOrganism---biolink:AnatomicalEntity": 1,
    "biolink:IndividualOrganism---biolink:InformationContentEntity": 3,
    "biolink:IndividualOrganism---biolink:NamedThing": 20,
    "biolink:IndividualOrganism---biolink:OrganismTaxon": 38,
    "biolink:IndividualOrganism---biolink:Phenomenon": 5,
    "biolink:IndividualOrganism---biolink:Procedure": 8,
    "biolink:MolecularActivity---biolink:Activity": 2,
    "biolink:MolecularActivity---biolink:BiologicalProcess": 243,
    "biolink:MolecularActivity---biolink:ClinicalIntervention": 2,
    "biolink:MolecularActivity---biolink:InformationContentEntity": 2,
    "biolink:MolecularActivity---biolink:NamedThing": 47,
    "biolink:MolecularActivity---biolink:PathologicalProcess": 4,
    "biolink:MolecularActivity---biolink:Pathway": 313,
    "biolink:MolecularActivity---biolink:PhenotypicFeature": 4,
    "biolink:MolecularActivity---biolink:Procedure": 3,
    "biolink:MolecularActivity---biolink:Protein": 2,
    "biolink:MolecularActivity---biolink:ProteinDomain": 1,
    "biolink:NamedThing---biolink:Activity": 985,
    "biolink:NamedThing---biolink:Agent": 490,
    "biolink:NamedThing---biolink:AnatomicalEntity": 1243,
    "biolink:NamedThing---biolink:BiologicalEntity": 85702,
    "biolink:NamedThing---biolink:BiologicalProcess": 11,
    "biolink:NamedThing---biolink:ChemicalEntity": 1,
    "biolink:NamedThing---biolink:ClinicalIntervention": 223,
    "biolink:NamedThing---biolink:Device": 4,
    "biolink:NamedThing---biolink:Disease": 28,
    "biolink:NamedThing---biolink:Drug": 191,
    "biolink:NamedThing---biolink:Gene": 54922,
    "biolink:NamedThing---biolink:GeographicLocation": 1,
    "biolink:NamedThing---biolink:InformationContentEntity": 33432,
    "biolink:NamedThing---biolink:MaterialSample": 68,
    "biolink:NamedThing---biolink:MolecularEntity": 1,
    "biolink:NamedThing---biolink:NoncodingRNAProduct": 1,
    "biolink:NamedThing---biolink:OrganismTaxon": 22,
    "biolink:NamedThing---biolink:PathologicalProcess": 4,
    "biolink:NamedThing---biolink:Pathway": 331,
    "biolink:NamedThing---biolink:Phenomenon": 30,
    "biolink:NamedThing---biolink:PhenotypicFeature": 281,
    "biolink:NamedThing---biolink:Procedure": 168,
    "biolink:NamedThing---biolink:Protein": 18,
    "biolink:NamedThing---biolink:ProteinDomain": 10,
    "biolink:NamedThing---biolink:ProteinFamily": 1,
    "biolink:NamedThing---biolink:Treatment": 1,
    "biolink:NucleicAcidEntity---biolink:AnatomicalEntity": 123,
    "biolink:NucleicAcidEntity---biolink:BiologicalEntity": 20,
    "biolink:NucleicAcidEntity---biolink:ChemicalEntity": 4862,
    "biolink:NucleicAcidEntity---biolink:Drug": 126,
    "biolink:NucleicAcidEntity---biolink:Gene": 6,
    "biolink:NucleicAcidEntity---biolink:InformationContentEntity": 2,
    "biolink:NucleicAcidEntity---biolink:MaterialSample": 17,
    "biolink:NucleicAcidEntity---biolink:MolecularEntity": 1,
    "biolink:NucleicAcidEntity---biolink:NamedThing": 64,
    "biolink:NucleicAcidEntity---biolink:NoncodingRNAProduct": 276,
    "biolink:NucleicAcidEntity---biolink:OrganismTaxon": 14,
    "biolink:NucleicAcidEntity---biolink:PhenotypicFeature": 9,
    "biolink:NucleicAcidEntity---biolink:Protein": 2,
    "biolink:NucleicAcidEntity---biolink:Transcript": 6,
    "biolink:OrganismTaxon---biolink:Activity": 2,
    "biolink:OrganismTaxon---biolink:Agent": 6,
    "biolink:OrganismTaxon---biolink:BiologicalEntity": 12,
    "biolink:OrganismTaxon---biolink:Drug": 244,
    "biolink:OrganismTaxon---biolink:InformationContentEntity": 1,
    "biolink:OrganismTaxon---biolink:NamedThing": 37,
    "biolink:OrganismTaxon---biolink:Procedure": 1,
    "biolink:PathologicalProcess---biolink:Activity": 2,
    "biolink:PathologicalProcess---biolink:Agent": 27,
    "biolink:PathologicalProcess---biolink:AnatomicalEntity": 5,
    "biolink:PathologicalProcess---biolink:BiologicalEntity": 20,
    "biolink:PathologicalProcess---biolink:BiologicalProcess": 10,
    "biolink:PathologicalProcess---biolink:BiologicalProcessOrActivity": 1,
    "biolink:PathologicalProcess---biolink:ClinicalIntervention": 4,
    "biolink:PathologicalProcess---biolink:Disease": 381,
    "biolink:PathologicalProcess---biolink:InformationContentEntity": 17,
    "biolink:PathologicalProcess---biolink:NamedThing": 1169,
    "biolink:PathologicalProcess---biolink:Pathway": 1,
    "biolink:PathologicalProcess---biolink:Phenomenon": 1,
    "biolink:PathologicalProcess---biolink:PhenotypicFeature": 1592,
    "biolink:PathologicalProcess---biolink:Procedure": 16,
    "biolink:Phenomenon---biolink:Activity": 70,
    "biolink:Phenomenon---biolink:Agent": 69,
    "biolink:Phenomenon---biolink:AnatomicalEntity": 4,
    "biolink:Phenomenon---biolink:BiologicalEntity": 1,
    "biolink:Phenomenon---biolink:BiologicalProcess": 40,
    "biolink:Phenomenon---biolink:ClinicalIntervention": 11,
    "biolink:Phenomenon---biolink:Device": 1,
    "biolink:Phenomenon---biolink:Disease": 2,
    "biolink:Phenomenon---biolink:InformationContentEntity": 226,
    "biolink:Phenomenon---biolink:MaterialSample": 1,
    "biolink:Phenomenon---biolink:NamedThing": 177,
    "biolink:Phenomenon---biolink:OrganismTaxon": 1,
    "biolink:Phenomenon---biolink:PathologicalProcess": 8,
    "biolink:Phenomenon---biolink:Pathway": 3,
    "biolink:Phenomenon---biolink:PhenotypicFeature": 1452,
    "biolink:Phenomenon---biolink:Procedure": 34,
    "biolink:PhenotypicFeature---biolink:Activity": 7,
    "biolink:PhenotypicFeature---biolink:Agent": 4,
    "biolink:PhenotypicFeature---biolink:AnatomicalEntity": 2,
    "biolink:PhenotypicFeature---biolink:BiologicalEntity": 1,
    "biolink:PhenotypicFeature---biolink:BiologicalProcess": 1,
    "biolink:PhenotypicFeature---biolink:ClinicalIntervention": 4,
    "biolink:PhenotypicFeature---biolink:Disease": 16,
    "biolink:PhenotypicFeature---biolink:DiseaseOrPhenotypicFeature": 20,
    "biolink:PhenotypicFeature---biolink:InformationContentEntity": 6,
    "biolink:PhenotypicFeature---biolink:NamedThing": 759,
    "biolink:PhenotypicFeature---biolink:Phenomenon": 30,
    "biolink:PhenotypicFeature---biolink:Procedure": 5,
    "biolink:PhysicalEntity---biolink:Agent": 22,
    "biolink:PhysicalEntity---biolink:Device": 49,
    "biolink:PhysicalEntity---biolink:Drug": 25,
    "biolink:PhysicalEntity---biolink:InformationContentEntity": 5,
    "biolink:PhysicalEntity---biolink:MaterialSample": 11,
    "biolink:PhysicalEntity---biolink:NamedThing": 130,
    "biolink:PhysicalEntity---biolink:Phenomenon": 2,
    "biolink:PhysicalEntity---biolink:Procedure": 60,
    "biolink:PhysiologicalProcess---biolink:Activity": 32,
    "biolink:PhysiologicalProcess---biolink:Agent": 1,
    "biolink:PhysiologicalProcess---biolink:AnatomicalEntity": 7,
    "biolink:PhysiologicalProcess---biolink:BiologicalEntity": 15,
    "biolink:PhysiologicalProcess---biolink:BiologicalProcess": 645,
    "biolink:PhysiologicalProcess---biolink:BiologicalProcessOrActivity": 11,
    "biolink:PhysiologicalProcess---biolink:ClinicalIntervention": 12,
    "biolink:PhysiologicalProcess---biolink:Disease": 5,
    "biolink:PhysiologicalProcess---biolink:InformationContentEntity": 16,
    "biolink:PhysiologicalProcess---biolink:NamedThing": 152,
    "biolink:PhysiologicalProcess---biolink:PathologicalProcess": 15,
    "biolink:PhysiologicalProcess---biolink:Pathway": 116,
    "biolink:PhysiologicalProcess---biolink:Phenomenon": 46,
    "biolink:PhysiologicalProcess---biolink:PhenotypicFeature": 104,
    "biolink:PhysiologicalProcess---biolink:Procedure": 53,
    "biolink:Polypeptide---biolink:AnatomicalEntity": 43,
    "biolink:Polypeptide---biolink:BiologicalEntity": 8270,
    "biolink:Polypeptide---biolink:Device": 1,
    "biolink:Polypeptide---biolink:Drug": 580,
    "biolink:Polypeptide---biolink:MolecularEntity": 60,
    "biolink:Polypeptide---biolink:NamedThing": 111749,
    "biolink:Polypeptide---biolink:NoncodingRNAProduct": 4,
    "biolink:Polypeptide---biolink:PhenotypicFeature": 32,
    "biolink:Polypeptide---biolink:Procedure": 1,
    "biolink:Polypeptide---biolink:Protein": 4543,
    "biolink:Polypeptide---biolink:ProteinDomain": 133,
    "biolink:Polypeptide---biolink:ProteinFamily": 99,
    "biolink:Polypeptide---biolink:Treatment": 2,
    "biolink:PopulationOfIndividualOrganisms---biolink:Activity": 3,
    "biolink:PopulationOfIndividualOrganisms---biolink:Agent": 53,
    "biolink:PopulationOfIndividualOrganisms---biolink:BiologicalEntity": 2,
    "biolink:PopulationOfIndividualOrganisms---biolink:InformationContentEntity": 9,
    "biolink:PopulationOfIndividualOrganisms---biolink:NamedThing": 34,
    "biolink:PopulationOfIndividualOrganisms---biolink:OrganismTaxon": 1,
    "biolink:PopulationOfIndividualOrganisms---biolink:OrganismalEntity": 1,
    "biolink:PopulationOfIndividualOrganisms---biolink:Phenomenon": 3,
    "biolink:PopulationOfIndividualOrganisms---biolink:Procedure": 3,
    "biolink:Procedure---biolink:Activity": 502,
    "biolink:Procedure---biolink:Agent": 290,
    "biolink:Procedure---biolink:AnatomicalEntity": 15,
    "biolink:Procedure---biolink:BiologicalProcess": 4,
    "biolink:Procedure---biolink:ClinicalIntervention": 8110,
    "biolink:Procedure---biolink:Device": 1,
    "biolink:Procedure---biolink:Drug": 8,
    "biolink:Procedure---biolink:InformationContentEntity": 24,
    "biolink:Procedure---biolink:MaterialSample": 4,
    "biolink:Procedure---biolink:NamedThing": 544,
    "biolink:Procedure---biolink:OrganismTaxon": 1,
    "biolink:Procedure---biolink:Phenomenon": 4,
    "biolink:Procedure---biolink:PhenotypicFeature": 18,
    "biolink:Procedure---biolink:Treatment": 4169,
    "biolink:Protein---biolink:AnatomicalEntity": 5,
    "biolink:Protein---biolink:BiologicalEntity": 762,
    "biolink:Protein---biolink:Drug": 41,
    "biolink:Protein---biolink:Gene": 1,
    "biolink:Protein---biolink:MolecularEntity": 74,
    "biolink:Protein---biolink:NamedThing": 601,
    "biolink:Protein---biolink:NoncodingRNAProduct": 1,
    "biolink:Protein---biolink:PhenotypicFeature": 2,
    "biolink:Protein---biolink:Polypeptide": 60812,
    "biolink:Protein---biolink:ProteinDomain": 1,
    "biolink:Protein---biolink:ProteinFamily": 79,
    "biolink:Publication---biolink:Activity": 208,
    "biolink:Publication---biolink:Agent": 99,
    "biolink:Publication---biolink:AnatomicalEntity": 30,
    "biolink:Publication---biolink:BiologicalEntity": 2,
    "biolink:Publication---biolink:BiologicalProcess": 5,
    "biolink:Publication---biolink:ClinicalIntervention": 14572,
    "biolink:Publication---biolink:InformationContentEntity": 1572,
    "biolink:Publication---biolink:MaterialSample": 20,
    "biolink:Publication---biolink:NamedThing": 221,
    "biolink:Publication---biolink:OrganismTaxon": 3,
    "biolink:Publication---biolink:Phenomenon": 15,
    "biolink:Publication---biolink:PhenotypicFeature": 27,
    "biolink:Publication---biolink:Procedure": 69,
    "biolink:SmallMolecule---biolink:ChemicalEntity": 1378,
    "biolink:SmallMolecule---biolink:Drug": 237,
    "biolink:SmallMolecule---biolink:InformationContentEntity": 5,
    "biolink:SmallMolecule---biolink:MaterialSample": 2,
    "biolink:SmallMolecule---biolink:MolecularEntity": 9,
    "biolink:SmallMolecule---biolink:NamedThing": 6
}
saramsey commented 1 year ago

Hi @ecwood, thank you for this great analysis. So, biolink:Drug is a direct descendant of biolink:ChemicalEntity (per this tree view of the Biolink 3.5.4 category hierarchy), so wouldn't biolink:Drug kind of supersede biolink:ChemicalEntity for those cases? Unless we feel that biolink:Drug has been assigned erroneously; that would be a different matter and worthy of implementing a fix, for sure.

ecwood commented 1 year ago

To do on this issue:

ecwood commented 1 year ago

In order to triage, here is the discrepancy map sorted by value:

biolink:Drug---biolink:ChemicalEntity: 187984
biolink:Polypeptide---biolink:NamedThing: 111749
biolink:NamedThing---biolink:BiologicalEntity: 85702
biolink:GrossAnatomicalStructure---biolink:AnatomicalEntity: 75418
biolink:Protein---biolink:Polypeptide: 60812
biolink:NamedThing---biolink:Gene: 54922
biolink:DiseaseOrPhenotypicFeature---biolink:NamedThing: 35529
biolink:Drug---biolink:NamedThing: 34956
biolink:NamedThing---biolink:InformationContentEntity: 33432
biolink:DiseaseOrPhenotypicFeature---biolink:PhenotypicFeature: 24574
biolink:Publication---biolink:ClinicalIntervention: 14572
biolink:Disease---biolink:PhenotypicFeature: 13273
biolink:Disease---biolink:NamedThing: 8606
biolink:Polypeptide---biolink:BiologicalEntity: 8270
biolink:Procedure---biolink:ClinicalIntervention: 8110
biolink:ChemicalEntity---biolink:Drug: 5750
biolink:NucleicAcidEntity---biolink:ChemicalEntity: 4862
biolink:Polypeptide---biolink:Protein: 4543
biolink:Procedure---biolink:Treatment: 4169
biolink:Device---biolink:NamedThing: 2955
biolink:CellularComponent---biolink:AnatomicalEntity: 2555
biolink:DiseaseOrPhenotypicFeature---biolink:InformationContentEntity: 2459
biolink:Device---biolink:Procedure: 2180
biolink:Cell---biolink:AnatomicalEntity: 1753
biolink:PathologicalProcess---biolink:PhenotypicFeature: 1592
biolink:Publication---biolink:InformationContentEntity: 1572
biolink:Device---biolink:Drug: 1501
biolink:Phenomenon---biolink:PhenotypicFeature: 1452
biolink:SmallMolecule---biolink:ChemicalEntity: 1378
biolink:NamedThing---biolink:AnatomicalEntity: 1243
biolink:PathologicalProcess---biolink:NamedThing: 1169
biolink:Drug---biolink:Procedure: 1097
biolink:Activity---biolink:Procedure: 1060
biolink:NamedThing---biolink:Activity: 985
biolink:Cell---biolink:Drug: 849
biolink:DiseaseOrPhenotypicFeature---biolink:Procedure: 804
biolink:Protein---biolink:BiologicalEntity: 762
biolink:PhenotypicFeature---biolink:NamedThing: 759
biolink:ChemicalEntity---biolink:NamedThing: 718
biolink:PhysiologicalProcess---biolink:BiologicalProcess: 645
biolink:DiseaseOrPhenotypicFeature---biolink:ClinicalIntervention: 641
biolink:Agent---biolink:NamedThing: 622
biolink:Protein---biolink:NamedThing: 601
biolink:Polypeptide---biolink:Drug: 580
biolink:Procedure---biolink:NamedThing: 544
biolink:Procedure---biolink:Activity: 502
biolink:NamedThing---biolink:Agent: 490
biolink:Activity---biolink:Agent: 488
biolink:Behavior---biolink:Activity: 407
biolink:PathologicalProcess---biolink:Disease: 381
biolink:NamedThing---biolink:Pathway: 331
biolink:MolecularActivity---biolink:Pathway: 313
biolink:Procedure---biolink:Agent: 290
biolink:NamedThing---biolink:PhenotypicFeature: 281
biolink:NucleicAcidEntity---biolink:NoncodingRNAProduct: 276
biolink:OrganismTaxon---biolink:Drug: 244
biolink:MolecularActivity---biolink:BiologicalProcess: 243
biolink:DiseaseOrPhenotypicFeature---biolink:Activity: 241
biolink:SmallMolecule---biolink:Drug: 237
biolink:Phenomenon---biolink:InformationContentEntity: 226
biolink:NamedThing---biolink:ClinicalIntervention: 223
biolink:Publication---biolink:NamedThing: 221
biolink:Publication---biolink:Activity: 208
biolink:NamedThing---biolink:Drug: 191
biolink:Activity---biolink:ClinicalIntervention: 179
biolink:Phenomenon---biolink:NamedThing: 177
biolink:Disease---biolink:BiologicalEntity: 169
biolink:NamedThing---biolink:Procedure: 168
biolink:Activity---biolink:NamedThing: 168
biolink:Food---biolink:NamedThing: 161
biolink:DiseaseOrPhenotypicFeature---biolink:Disease: 154
biolink:PhysiologicalProcess---biolink:NamedThing: 152
biolink:AnatomicalEntity---biolink:MaterialSample: 142
biolink:Cohort---biolink:Agent: 136
biolink:Cell---biolink:MaterialSample: 134
biolink:Polypeptide---biolink:ProteinDomain: 133
biolink:PhysicalEntity---biolink:NamedThing: 130
biolink:Behavior---biolink:Phenomenon: 127
biolink:NucleicAcidEntity---biolink:Drug: 126
biolink:NucleicAcidEntity---biolink:AnatomicalEntity: 123
biolink:Food---biolink:ChemicalEntity: 123
biolink:Behavior---biolink:NamedThing: 119
biolink:PhysiologicalProcess---biolink:Pathway: 116
biolink:Food---biolink:Drug: 116
biolink:GrossAnatomicalStructure---biolink:NamedThing: 112
biolink:ChemicalEntity---biolink:BiologicalEntity: 108
biolink:PhysiologicalProcess---biolink:PhenotypicFeature: 104
biolink:ChemicalEntity---biolink:Procedure: 100
biolink:Publication---biolink:Agent: 99
biolink:Polypeptide---biolink:ProteinFamily: 99
biolink:Drug---biolink:BiologicalEntity: 97
biolink:Food---biolink:BiologicalEntity: 93
biolink:AnatomicalEntity---biolink:InformationContentEntity: 85
biolink:Drug---biolink:Protein: 84
biolink:Protein---biolink:ProteinFamily: 79
biolink:Protein---biolink:MolecularEntity: 74
biolink:Behavior---biolink:BiologicalProcess: 71
biolink:Phenomenon---biolink:Activity: 70
biolink:Disease---biolink:Activity: 70
biolink:Publication---biolink:Procedure: 69
biolink:Phenomenon---biolink:Agent: 69
biolink:NamedThing---biolink:MaterialSample: 68
biolink:Drug---biolink:OrganismTaxon: 68
biolink:Disease---biolink:PathologicalProcess: 66
biolink:ChemicalEntity---biolink:MaterialSample: 65
biolink:AnatomicalEntity---biolink:NamedThing: 65
biolink:NucleicAcidEntity---biolink:NamedThing: 64
biolink:Food---biolink:InformationContentEntity: 64
biolink:Drug---biolink:ChemicalMixture: 63
biolink:DiseaseOrPhenotypicFeature---biolink:Agent: 63
biolink:Disease---biolink:DiseaseOrPhenotypicFeature: 63
biolink:Device---biolink:MaterialSample: 61
biolink:Polypeptide---biolink:MolecularEntity: 60
biolink:PhysicalEntity---biolink:Procedure: 60
biolink:Device---biolink:Agent: 56
biolink:PopulationOfIndividualOrganisms---biolink:Agent: 53
biolink:PhysiologicalProcess---biolink:Procedure: 53
biolink:Cohort---biolink:NamedThing: 52
biolink:Behavior---biolink:Agent: 52
biolink:PhysicalEntity---biolink:Device: 49
biolink:GrossAnatomicalStructure---biolink:MaterialSample: 49
biolink:MolecularActivity---biolink:NamedThing: 47
biolink:PhysiologicalProcess---biolink:Phenomenon: 46
biolink:Cell---biolink:NamedThing: 45
biolink:DiseaseOrPhenotypicFeature---biolink:AnatomicalEntity: 44
biolink:Polypeptide---biolink:AnatomicalEntity: 43
biolink:Disease---biolink:AnatomicalEntity: 42
biolink:Protein---biolink:Drug: 41
biolink:Phenomenon---biolink:BiologicalProcess: 40
biolink:GrossAnatomicalStructure---biolink:PhenotypicFeature: 40
biolink:Event---biolink:InformationContentEntity: 39
biolink:IndividualOrganism---biolink:OrganismTaxon: 38
biolink:OrganismTaxon---biolink:NamedThing: 37
biolink:Drug---biolink:AnatomicalEntity: 36
biolink:Cohort---biolink:Procedure: 35
biolink:PopulationOfIndividualOrganisms---biolink:NamedThing: 34
biolink:Phenomenon---biolink:Procedure: 34
biolink:Food---biolink:Procedure: 34
biolink:GrossAnatomicalStructure---biolink:Procedure: 33
biolink:Polypeptide---biolink:PhenotypicFeature: 32
biolink:PhysiologicalProcess---biolink:Activity: 32
biolink:Publication---biolink:AnatomicalEntity: 30
biolink:PhenotypicFeature---biolink:Phenomenon: 30
biolink:NamedThing---biolink:Phenomenon: 30
biolink:Disease---biolink:InformationContentEntity: 29
biolink:NamedThing---biolink:Disease: 28
biolink:Disease---biolink:Procedure: 28
biolink:ChemicalEntity---biolink:AnatomicalEntity: 28
biolink:Behavior---biolink:InformationContentEntity: 28
biolink:Publication---biolink:PhenotypicFeature: 27
biolink:PathologicalProcess---biolink:Agent: 27
biolink:Activity---biolink:InformationContentEntity: 26
biolink:PhysicalEntity---biolink:Drug: 25
biolink:Procedure---biolink:InformationContentEntity: 24
biolink:GrossAnatomicalStructure---biolink:InformationContentEntity: 24
biolink:Behavior---biolink:PhenotypicFeature: 24
biolink:AnatomicalEntity---biolink:PhenotypicFeature: 23
biolink:PhysicalEntity---biolink:Agent: 22
biolink:NamedThing---biolink:OrganismTaxon: 22
biolink:CellularComponent---biolink:BiologicalEntity: 21
biolink:Publication---biolink:MaterialSample: 20
biolink:PhenotypicFeature---biolink:DiseaseOrPhenotypicFeature: 20
biolink:PathologicalProcess---biolink:BiologicalEntity: 20
biolink:NucleicAcidEntity---biolink:BiologicalEntity: 20
biolink:IndividualOrganism---biolink:NamedThing: 20
biolink:CellularComponent---biolink:Protein: 20
biolink:Procedure---biolink:PhenotypicFeature: 18
biolink:NamedThing---biolink:Protein: 18
biolink:Disease---biolink:BiologicalProcess: 18
biolink:ChemicalEntity---biolink:PhenotypicFeature: 18
biolink:PathologicalProcess---biolink:InformationContentEntity: 17
biolink:NucleicAcidEntity---biolink:MaterialSample: 17
biolink:DiseaseOrPhenotypicFeature---biolink:Phenomenon: 17
biolink:Disease---biolink:Phenomenon: 17
biolink:PhysiologicalProcess---biolink:InformationContentEntity: 16
biolink:PhenotypicFeature---biolink:Disease: 16
biolink:PathologicalProcess---biolink:Procedure: 16
biolink:CellularComponent---biolink:NamedThing: 16
biolink:Publication---biolink:Phenomenon: 15
biolink:Procedure---biolink:AnatomicalEntity: 15
biolink:PhysiologicalProcess---biolink:PathologicalProcess: 15
biolink:PhysiologicalProcess---biolink:BiologicalEntity: 15
biolink:DiseaseOrPhenotypicFeature---biolink:BiologicalProcess: 15
biolink:Cell---biolink:InformationContentEntity: 15
biolink:NucleicAcidEntity---biolink:OrganismTaxon: 14
biolink:Drug---biolink:Treatment: 14
biolink:Disease---biolink:Agent: 14
biolink:ChemicalEntity---biolink:OrganismTaxon: 14
biolink:AnatomicalEntity---biolink:Procedure: 14
biolink:Activity---biolink:Phenomenon: 14
biolink:Drug---biolink:MaterialSample: 13
biolink:PhysiologicalProcess---biolink:ClinicalIntervention: 12
biolink:OrganismTaxon---biolink:BiologicalEntity: 12
biolink:Cohort---biolink:Activity: 12
biolink:PhysiologicalProcess---biolink:BiologicalProcessOrActivity: 11
biolink:PhysicalEntity---biolink:MaterialSample: 11
biolink:Phenomenon---biolink:ClinicalIntervention: 11
biolink:NamedThing---biolink:BiologicalProcess: 11
biolink:IndividualOrganism---biolink:Agent: 11
biolink:IndividualOrganism---biolink:Activity: 11
biolink:ChemicalEntity---biolink:Agent: 11
biolink:PathologicalProcess---biolink:BiologicalProcess: 10
biolink:NamedThing---biolink:ProteinDomain: 10
biolink:Drug---biolink:MolecularEntity: 10
biolink:DiseaseOrPhenotypicFeature---biolink:BiologicalEntity: 10
biolink:ChemicalEntity---biolink:Protein: 10
biolink:CellularComponent---biolink:PhenotypicFeature: 10
biolink:Agent---biolink:Activity: 10
biolink:SmallMolecule---biolink:MolecularEntity: 9
biolink:PopulationOfIndividualOrganisms---biolink:InformationContentEntity: 9
biolink:NucleicAcidEntity---biolink:PhenotypicFeature: 9
biolink:Disease---biolink:Cell: 9
biolink:Cohort---biolink:InformationContentEntity: 9
biolink:ChemicalEntity---biolink:Device: 9
biolink:Procedure---biolink:Drug: 8
biolink:Phenomenon---biolink:PathologicalProcess: 8
biolink:IndividualOrganism---biolink:Procedure: 8
biolink:DiseaseOrPhenotypicFeature---biolink:Cell: 8
biolink:Device---biolink:ClinicalIntervention: 8
biolink:ChemicalEntity---biolink:InformationContentEntity: 8
biolink:PhysiologicalProcess---biolink:AnatomicalEntity: 7
biolink:PhenotypicFeature---biolink:Activity: 7
biolink:Drug---biolink:ClinicalIntervention: 7
biolink:Disease---biolink:MaterialSample: 7
biolink:SmallMolecule---biolink:NamedThing: 6
biolink:PhenotypicFeature---biolink:InformationContentEntity: 6
biolink:OrganismTaxon---biolink:Agent: 6
biolink:NucleicAcidEntity---biolink:Transcript: 6
biolink:NucleicAcidEntity---biolink:Gene: 6
biolink:Drug---biolink:InformationContentEntity: 6
biolink:Cohort---biolink:PopulationOfIndividualOrganisms: 6
biolink:Behavior---biolink:Procedure: 6
biolink:Agent---biolink:Procedure: 6
biolink:SmallMolecule---biolink:InformationContentEntity: 5
biolink:Publication---biolink:BiologicalProcess: 5
biolink:Protein---biolink:AnatomicalEntity: 5
biolink:PhysiologicalProcess---biolink:Disease: 5
biolink:PhysicalEntity---biolink:InformationContentEntity: 5
biolink:PhenotypicFeature---biolink:Procedure: 5
biolink:PathologicalProcess---biolink:AnatomicalEntity: 5
biolink:IndividualOrganism---biolink:Phenomenon: 5
biolink:GrossAnatomicalStructure---biolink:Drug: 5
biolink:GrossAnatomicalStructure---biolink:BiologicalEntity: 5
biolink:GeographicLocation---biolink:NamedThing: 5
biolink:GeographicLocation---biolink:InformationContentEntity: 5
biolink:CellularComponent---biolink:MaterialSample: 5
biolink:CellularComponent---biolink:InformationContentEntity: 5
biolink:Cell---biolink:BiologicalEntity: 5
biolink:AnatomicalEntity---biolink:ClinicalIntervention: 5
biolink:AnatomicalEntity---biolink:Agent: 5
biolink:Procedure---biolink:Phenomenon: 4
biolink:Procedure---biolink:MaterialSample: 4
biolink:Procedure---biolink:BiologicalProcess: 4
biolink:Polypeptide---biolink:NoncodingRNAProduct: 4
biolink:PhenotypicFeature---biolink:ClinicalIntervention: 4
biolink:PhenotypicFeature---biolink:Agent: 4
biolink:Phenomenon---biolink:AnatomicalEntity: 4
biolink:PathologicalProcess---biolink:ClinicalIntervention: 4
biolink:NamedThing---biolink:PathologicalProcess: 4
biolink:NamedThing---biolink:Device: 4
biolink:MolecularActivity---biolink:PhenotypicFeature: 4
biolink:MolecularActivity---biolink:PathologicalProcess: 4
biolink:GrossAnatomicalStructure---biolink:OrganismTaxon: 4
biolink:GrossAnatomicalStructure---biolink:Device: 4
biolink:GeographicLocation---biolink:Agent: 4
biolink:Event---biolink:Activity: 4
biolink:DiseaseOrPhenotypicFeature---biolink:PathologicalProcess: 4
biolink:CellularComponent---biolink:ProteinDomain: 4
biolink:Cell---biolink:PhenotypicFeature: 4
biolink:Behavior---biolink:Disease: 4
biolink:AnatomicalEntity---biolink:Device: 4
biolink:Publication---biolink:OrganismTaxon: 3
biolink:PopulationOfIndividualOrganisms---biolink:Procedure: 3
biolink:PopulationOfIndividualOrganisms---biolink:Phenomenon: 3
biolink:PopulationOfIndividualOrganisms---biolink:Activity: 3
biolink:Phenomenon---biolink:Pathway: 3
biolink:MolecularActivity---biolink:Procedure: 3
biolink:IndividualOrganism---biolink:InformationContentEntity: 3
biolink:Food---biolink:ClinicalIntervention: 3
biolink:Device---biolink:AnatomicalEntity: 3
biolink:ChemicalEntity---biolink:Activity: 3
biolink:Behavior---biolink:ClinicalIntervention: 3
biolink:Behavior---biolink:BiologicalEntity: 3
biolink:AnatomicalEntity---biolink:BiologicalEntity: 3
biolink:Activity---biolink:BiologicalEntity: 3
biolink:SmallMolecule---biolink:MaterialSample: 2
biolink:Publication---biolink:BiologicalEntity: 2
biolink:Protein---biolink:PhenotypicFeature: 2
biolink:PopulationOfIndividualOrganisms---biolink:BiologicalEntity: 2
biolink:Polypeptide---biolink:Treatment: 2
biolink:PhysicalEntity---biolink:Phenomenon: 2
biolink:PhenotypicFeature---biolink:AnatomicalEntity: 2
biolink:Phenomenon---biolink:Disease: 2
biolink:PathologicalProcess---biolink:Activity: 2
biolink:OrganismTaxon---biolink:Activity: 2
biolink:NucleicAcidEntity---biolink:Protein: 2
biolink:NucleicAcidEntity---biolink:InformationContentEntity: 2
biolink:MolecularActivity---biolink:Protein: 2
biolink:MolecularActivity---biolink:InformationContentEntity: 2
biolink:MolecularActivity---biolink:ClinicalIntervention: 2
biolink:MolecularActivity---biolink:Activity: 2
biolink:Food---biolink:OrganismTaxon: 2
biolink:Drug---biolink:ProteinFamily: 2
biolink:Drug---biolink:PhenotypicFeature: 2
biolink:Drug---biolink:Gene: 2
biolink:Device---biolink:InformationContentEntity: 2
biolink:CellularComponent---biolink:BiologicalProcess: 2
biolink:Agent---biolink:InformationContentEntity: 2
biolink:Agent---biolink:ClinicalIntervention: 2
biolink:Activity---biolink:PhenotypicFeature: 2
biolink:Activity---biolink:BiologicalProcess: 2
biolink:Protein---biolink:ProteinDomain: 1
biolink:Protein---biolink:NoncodingRNAProduct: 1
biolink:Protein---biolink:Gene: 1
biolink:Procedure---biolink:OrganismTaxon: 1
biolink:Procedure---biolink:Device: 1
biolink:PopulationOfIndividualOrganisms---biolink:OrganismalEntity: 1
biolink:PopulationOfIndividualOrganisms---biolink:OrganismTaxon: 1
biolink:Polypeptide---biolink:Procedure: 1
biolink:Polypeptide---biolink:Device: 1
biolink:PhysiologicalProcess---biolink:Agent: 1
biolink:PhenotypicFeature---biolink:BiologicalProcess: 1
biolink:PhenotypicFeature---biolink:BiologicalEntity: 1
biolink:Phenomenon---biolink:OrganismTaxon: 1
biolink:Phenomenon---biolink:MaterialSample: 1
biolink:Phenomenon---biolink:Device: 1
biolink:Phenomenon---biolink:BiologicalEntity: 1
biolink:PathologicalProcess---biolink:Phenomenon: 1
biolink:PathologicalProcess---biolink:Pathway: 1
biolink:PathologicalProcess---biolink:BiologicalProcessOrActivity: 1
biolink:OrganismTaxon---biolink:Procedure: 1
biolink:OrganismTaxon---biolink:InformationContentEntity: 1
biolink:NucleicAcidEntity---biolink:MolecularEntity: 1
biolink:NamedThing---biolink:Treatment: 1
biolink:NamedThing---biolink:ProteinFamily: 1
biolink:NamedThing---biolink:NoncodingRNAProduct: 1
biolink:NamedThing---biolink:MolecularEntity: 1
biolink:NamedThing---biolink:GeographicLocation: 1
biolink:NamedThing---biolink:ChemicalEntity: 1
biolink:MolecularActivity---biolink:ProteinDomain: 1
biolink:IndividualOrganism---biolink:AnatomicalEntity: 1
biolink:GrossAnatomicalStructure---biolink:Gene: 1
biolink:GrossAnatomicalStructure---biolink:Disease: 1
biolink:GrossAnatomicalStructure---biolink:ClinicalIntervention: 1
biolink:GrossAnatomicalStructure---biolink:Agent: 1
biolink:GeographicLocation---biolink:BiologicalEntity: 1
biolink:Event---biolink:PhenotypicFeature: 1
biolink:Event---biolink:Agent: 1
biolink:Drug---biolink:NoncodingRNAProduct: 1
biolink:Drug---biolink:Disease: 1
biolink:Drug---biolink:Device: 1
biolink:Drug---biolink:Activity: 1
biolink:DiseaseOrPhenotypicFeature---biolink:OrganismTaxon: 1
biolink:DiseaseOrPhenotypicFeature---biolink:Drug: 1
biolink:Disease---biolink:Device: 1
biolink:Disease---biolink:ClinicalIntervention: 1
biolink:Device---biolink:PhenotypicFeature: 1
biolink:Device---biolink:Phenomenon: 1
biolink:Device---biolink:Activity: 1
biolink:Cohort---biolink:ClinicalIntervention: 1
biolink:Cohort---biolink:BiologicalEntity: 1
biolink:Cohort---biolink:AnatomicalEntity: 1
biolink:ChemicalEntity---biolink:Gene: 1
biolink:ChemicalEntity---biolink:ClinicalIntervention: 1
biolink:CellularComponent---biolink:MolecularEntity: 1
biolink:CellularComponent---biolink:Activity: 1
biolink:AnatomicalEntity---biolink:Drug: 1
biolink:Agent---biolink:PhysicalEntity: 1
biolink:Agent---biolink:OrganismTaxon: 1
biolink:Agent---biolink:Device: 1
biolink:Activity---biolink:Device: 1
biolink:Activity---biolink:AnatomicalEntity: 1
saramsey commented 1 year ago

Good analysis.

Would it make sense to focus on the top 10 or 20 rows?

ecwood commented 1 year ago

Here's the sources that are causing the issues for the top 10 rows:

{
    "biolink:DiseaseOrPhenotypicFeature---biolink:NamedThing": {
        "OMIM": 35061,
        "UMLS": 451,
        "MESH": 17
    },
    "biolink:DiseaseOrPhenotypicFeature---biolink:PhenotypicFeature": {
        "NCIT": 17843,
        "HP": 6731
    },
    "biolink:Drug---biolink:ChemicalEntity": {
        "UMLS": 111969,
        "MESH": 53396,
        "NDDF": 9052,
        "DRUGBANK": 5308,
        "ATC": 4173,
        "PDQ": 2647,
        "NCIT": 1092,
        "PSY": 347
    },
    "biolink:Drug---biolink:NamedThing": {
        "UMLS": 19522,
        "MESH": 7647,
        "DRUGBANK": 2603,
        "NDDF": 2213,
        "PDQ": 1881,
        "ATC": 700,
        "NCIT": 332,
        "PSY": 58
    },
    "biolink:GrossAnatomicalStructure---biolink:AnatomicalEntity": {
        "FMA": 71712,
        "NCIT": 3693,
        "MESH": 13
    },
    "biolink:NamedThing---biolink:BiologicalEntity": {
        "UMLS": 84862,
        "MESH": 654,
        "PDQ": 101,
        "NCIT": 52,
        "NDDF": 14,
        "PSY": 9,
        "ATC": 4,
        "DRUGBANK": 4,
        "FMA": 2
    },
    "biolink:NamedThing---biolink:Gene": {
        "HGNC": 43143,
        "NCIT": 11779
    },
    "biolink:NamedThing---biolink:InformationContentEntity": {
        "UMLS": 21032,
        "NCIT": 10854,
        "PSY": 988,
        "MESH": 373,
        "FMA": 147,
        "PDQ": 18,
        "HP": 10,
        "ICD9": 8,
        "NDDF": 2
    },
    "biolink:Polypeptide---biolink:NamedThing": {
        "UMLS": 60082,
        "MESH": 51323,
        "DRUGBANK": 104,
        "PDQ": 95,
        "NCIT": 48,
        "NDDF": 38,
        "ATC": 28,
        "PSY": 24,
        "OMIM": 7
    },
    "biolink:Protein---biolink:Polypeptide": {
        "UMLS": 33738,
        "MESH": 26986,
        "PSY": 33,
        "NDDF": 22,
        "PDQ": 14,
        "DRUGBANK": 8,
        "NCIT": 6,
        "ATC": 5
    }
}

Note: with biolink:NamedThing---biolink:InformationContentEntity, I mapped any node that didn't have a better alternative to InformationContentEntity to NamedThing to solve #326. Similar for biolink:NamedThing---biolink:BiologicalEntity (see #286).

ecwood commented 1 year ago

Examples of "discrepancies" that actually seem like improvements: (first category in list is new category, second one is old category)

https://identifiers.org/umls:C5706686 UMLS:C5706686 with name: Pertuzumab Zuvotolimod has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706687 UMLS:C5706687 with name: Briquilimab has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706688 UMLS:C5706688 with name: Latikafusp has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706689 UMLS:C5706689 with name: Davoceticept has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706690 UMLS:C5706690 with name: Eliapixant has category inconsistency: biolink:Drug ; biolink:ChemicalEntity
https://identifiers.org/umls:C5706691 UMLS:C5706691 with name: Epacmarstobart has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706692 UMLS:C5706692 with name: Simridarlimab has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706693 UMLS:C5706693 with name: Belrestotug has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706694 UMLS:C5706694 with name: Vepafestinib has category inconsistency: biolink:Drug ; biolink:ChemicalEntity
https://identifiers.org/umls:C5706697 UMLS:C5706697 with name: Dalnicastobart has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706698 UMLS:C5706698 with name: Polzastobart has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706699 UMLS:C5706699 with name: Pifusertib has category inconsistency: biolink:Drug ; biolink:ChemicalEntity
https://identifiers.org/umls:C5706700 UMLS:C5706700 with name: Trastuzumab Rezetecan has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706701 UMLS:C5706701 with name: Izalontamab has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706702 UMLS:C5706702 with name: Retlirafusp Alfa has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706703 UMLS:C5706703 with name: Emfizatamab has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706705 UMLS:C5706705 with name: Camoteskimab has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706706 UMLS:C5706706 with name: Izeltabart Tapatansine has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706707 UMLS:C5706707 with name: Ralzapastotug has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706708 UMLS:C5706708 with name: Visugromab has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706709 UMLS:C5706709 with name: Narazaciclib has category inconsistency: biolink:Drug ; biolink:ChemicalEntity
https://identifiers.org/umls:C5706853 UMLS:C5706853 with name: Vilamakitug has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706854 UMLS:C5706854 with name: Inezetamab has category inconsistency: biolink:Drug ; biolink:NamedThing
https://identifiers.org/umls:C5706855 UMLS:C5706855 with name: Exarafenib has category inconsistency: biolink:Drug ; biolink:ChemicalEntity
https://identifiers.org/umls:C5706856 UMLS:C5706856 with name: Sovilnesib has category inconsistency: biolink:Drug ; biolink:ChemicalEntity
https://identifiers.org/umls:C5706857 UMLS:C5706857 with name: Paridiprubart has category inconsistency: biolink:Drug ; biolink:NamedThing
ecwood commented 1 year ago

For the top ten node category discrepancy types, here are 20 nodes from category pair that were equally spaced out in the sample. Based on this, I think we need to target nodes with a new category of biolink:NamedThing and old category of biolink:BiologicalEntity or biolink:Gene the most. To do this, we are going to either (1) remap T028 ("Gene or Genome") to biolink:Gene rather than biolink:BiologicalEntity (which cannot be used since it is abstract) or (2) check the name property for "Gene" or "Allele" and assign the category that way, which will cover some cases. @saramsey, what are your thoughts?

New Category: biolink:Drug, Old Category: biolink:ChemicalEntity

Node Id Node Name Reasonable?
ATC:A01AA02 sodium monofluorophosphate biolink:ChemicalEntity
DRUGBANK:DB17223 Paclitaxel ceribate Yes
MESH:C006238 FL-70 This is a chemical mixture.
MESH:C044983 IRX 1767 biolink:ChemicalEntity
MESH:C082542 5,8-bis(2-aminoethylamino)-1-azaanthracene-9,10-dione biolink:ChemicalEntity
MESH:C417049 di-cysteine substituted hypocrellin B No, it's a photosynthesizer
MESH:C562147 YHO-13351 Yes, this looks like it is used as a drug
NDDF:002886 griseofulvin ultramicrosize Yes, this is a medication
PDQ:CDR0000763056 perflubutane Yes, for testing purposes (ultrasounds) but it does seem to be categorized as a drug
UMLS:C0066700 mometasone furoate Yes
UMLS:C0142429 SK&F 86002-A(2) Yes
UMLS:C0391006 ethyl octylphosphonofluoridate biolink:ChemicalEntity
UMLS:C0636296 BR 402 biolink:ChemicalEntity
UMLS:C0755511 (S)-DMDFT Yes, it looks like there are some papers discussing it as a drug
UMLS:C0950773 SM 9018 Yes
UMLS:C1310647 Sedermyl Yes
UMLS:C1965188 Evamist Yes
UMLS:C3256433 Platycladus orientalis leaf extract Yes
UMLS:C3849624 9-butyltriphenylphosphoniumacylamino-2,7-dibenzothiazolineflurene biolink:ChemicalEntity
UMLS:C4549894 FLA-16 compound Yes
UMLS:C5778305 Flexira Yes

New Category: biolink:Drug, Old Category: biolink:NamedThing

Node Id Node Name Reasonable?
ATC:A01AD08 becaplermin Yes
DRUGBANK:DB10879 Arrhenatherum elatius pollen Yes
MESH:C000600094 Fletikumab Yes
MESH:C036776 Limulus clotting factor B Maybe? It is an enzyme
MESH:C083798 peptitergent PD1 Yes
MESH:C425549 silk proteinase inhibitor 1, Galleria mellonella Unknown, leaning yes
MESH:D004166 Diphtheria Antitoxin Yes
NDDF:010270 ostomy supply MISCELL PASTE (GRAM) Yes
PDQ:CDR0000355732 anti-CD45 monoclonal antibody BC8 Yes
UMLS:C0025741 methyldopa Yes
UMLS:C0136039 defensin NP-3a Yes
UMLS:C0525186 enomycin Yes
UMLS:C0677518 Magnesiocard Yes
UMLS:C0935889 gastrin 17 vaccine Yes
UMLS:C1450181 DEFA1A3 protein, human Unclear
UMLS:C2353860 YM753 compound Yes
UMLS:C3273696 TrasGEX Yes
UMLS:C4256158 SH-polypeptide-46 Yes, though it seems to be cosmetic
UMLS:C4760804 CNTO 1959 Yes
UMLS:C5434034 IC7Fc cytokine Yes
UMLS:C5777128 HPV-16 trojan peptide vaccine Yes

New Category: biolink:Polypeptide, Old Category: biolink:NamedThing

Node Id Node Name Reasonable?
ATC:A16AX07 sapropterin No, this is described as a "cofactor", which Wikipedia describes as a non-protein
MESH:C000710696 TTC30B protein, human Yes
MESH:C079950 CSDE1 protein, human Yes
MESH:C106838 scorpion toxin AaIT5 Possibly, unsure about length
MESH:C416228 ppxA protein, Photorhabdus luminescens Yes
MESH:C485874 LOC103693936 Unknown, but likely
MESH:C499904 Fancd2 protein, mouse Yes
MESH:C522813 MTX2 protein, human Yes
MESH:C555906 mutanobactin A Unknown
MESH:D018435 ATP Binding Cassette Transporter, Subfamily B Yes
UMLS:C0215961 CBF2 protein, S cerevisiae Yes
UMLS:C0665092 OSR40C1 protein, Oryza sativa Yes
UMLS:C1142892 N(epsilon)-(malondialdehyde)lysine No, this seems to be a protein modification
UMLS:C1384562 Ank3 protein, mouse Yes
UMLS:C1447387 Mtk protein, Drosophila Yes
UMLS:C1530721 nfe2 protein, zebrafish Yes
UMLS:C1721876 CNGC12 protein, Arabidopsis Yes
UMLS:C2605853 NAG1 protein, S cerevisiae Yes
UMLS:C3490756 Thp5 peptide No, too short
UMLS:C4276902 eIFiso4G1 protein, Arabidopsis Yes
UMLS:C5773719 Nord protein, Drosophila Yes

New Category: biolink:Protein, Old Category: biolink:Polypeptide

Node Id Node Name Reasonable?
ATC:B02BD05 coagulation factor VII Yes
MESH:C000705328 Ace2 protein, mouse Yes
MESH:C035902 siderophore receptors Unclear
MESH:C083171 leukotriene E4 receptor No, from what I can see
MESH:C113880 AMY1.2 protein, Hordeum vulgare Yes
MESH:C465555 StcE protein, E coli Yes
MESH:C499530 REKS protein, Xenopus Yes
MESH:C527091 Klra3 protein, mouse Yes
MESH:C579458 monoamine oxidase A, human Yes
UMLS:C0008471 Chondroitinase-AC II Yes
UMLS:C0073223 ribitol 2-dehydrogenase Yes
UMLS:C0248508 MASP1 protein, human Yes
UMLS:C0609003 oxaloglycollate reductase (decarboxylating) Yes
UMLS:C0908976 protein kinase U Yes
UMLS:C1306954 srlB protein, E coli Yes
UMLS:C1448018 Usp15 protein, rat Yes
UMLS:C1608529 DUSP6 protein, human Yes
UMLS:C2002801 GDH protein, Arabidopsis Yes
UMLS:C3489633 AtSUS3 protein, Arabidopsis Yes
UMLS:C4277247 cyanate lyase (14-25), Oryza sativa Yes
UMLS:C5773117 fzo1 protein, S pombe Yes

New Category: biolink:NamedThing, Old Category: biolink:BiologicalEntity

Node Id Node Name Reasonable?
ATC:J07BK Varicella zoster vaccines This should probably be categorized as a drug.
UMLS:C1413422 CIDEB gene This should probably be a gene.
UMLS:C1418909 PRKAR1B gene This node has TUI T028 ("Gene or Genome"), which with biolink:GeneOrGenome as a mixin and biolink:BiologicalEntity as an abstract class, this is getting mapped to `biolink:NamedThing. I think we can do better.
UMLS:C1424351 RAB17 gene This should probably be a gene.
UMLS:C1455836 HRG gene This should probably be a gene.
UMLS:C1710294 THRA wt Allele This should probably be a gene.
UMLS:C1832789 PMP22, ALA67PRO This should probably be a gene.
UMLS:C1844972 GLA, TRP287TER This should probably be a gene.
UMLS:C1860541 CDH1, 1-BP INS, 1588C This should probably be a gene.
UMLS:C2003690 (124)I-cMAb U36 This should probably be a drug. It is a monoclonal antibody.
UMLS:C2680144 RPS20P26 gene This should probably be a gene.
UMLS:C2985387 SPANXN5 wt Allele This should probably be a gene.
UMLS:C3275388 FRMD7, IVS11, G-A, +5 This should probably be a gene.
UMLS:C3541252 GAGE7 wt Allele This should probably be a gene.
UMLS:C3807643 ZMYND10, LEU266PRO This should probably be a gene.
UMLS:C3889240 FAM183DP gene This should probably be a gene.
UMLS:C4225976 COQ4, ARG145GLY This should probably be a gene.
UMLS:C4320788 LINC02333 gene This should probably be a gene.
UMLS:C5193550 PLPBP, IVS2DS, G-A, +1 This should probably be a gene.
UMLS:C5445968 SKP2P1 gene This should probably be a gene.
UMLS:C5775124 UQCRH, 2.2-KB DEL, EX2-3DEL This should probably be a gene.

New Category: biolink:GrossAnatomicalStructure, Old Category: biolink:AnatomicalEntity

Node Id Node Name Reasonable?
FMA:10000 Eighth thoracic vertebral arch Yes
FMA:15671 Lamina propria mucosae of ascending colon Yes
FMA:21704 Mucosa of anterior inferior minor calyx Yes
FMA:235016 Left stratum zonale of thalamus Yes
FMA:261675 Fibrocollagenous connective tissue of crista supraventricularis (volume) Yes
FMA:275560 Corticospinal-corticobulbar pathway Yes
FMA:293351 Embryonic pole Yes
FMA:306791 Lateral cord segment of C6 root of musculocutaneous nerve Yes
FMA:318536 Muscle body of right auricularis posterior Yes
FMA:328024 Otic part of chondrocranium Yes
FMA:38286 Skin of posterior part of wrist Yes
FMA:44005 Medial part of plantar digital artery of left great toe Yes
FMA:48517 Muscle fasciculus of right multifidus thoracis Yes
FMA:53070 Left pterygopalatine ganglion Yes
FMA:59623 Vertical part of right inferior lacrimal canaliculus Yes
FMA:6598 Left cardiac branch to superficial part of cardiac plexus Yes
FMA:71506 Set of meningeal branches of vertebral artery Yes
FMA:76408 Trunk of brachialis branch of radial recurrent artery Yes
FMA:81222 Nerve to left flexor carpi ulnaris Yes
FMA:9805 Left fifth internal intercostal muscle Yes
NCIT:C97333 Locus Coeruleus Yes

New Category: biolink:NamedThing, Old Category: biolink:InformationContentEntity

Node Id Node Name Reasonable?
FMA:223260 Physical attribute relationship Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C1554891 Injection, perineural Yes, this seems odd and vague enough to not worry about a vague categorization.
NCIT:C112911 Irradiated Volume Yes, this seems odd and vague enough to not worry about a vague categorization.
NCIT:C142347 Intraepidermal Nerve Fiber Density Yes, this seems odd and vague enough to not worry about a vague categorization.
NCIT:C173620 Social Impact Yes, this seems odd and vague enough to not worry about a vague categorization.
NCIT:C28025 Asymmetry Yes, this seems odd and vague enough to not worry about a vague categorization.
NCIT:C64557 Yoctogram Yes, this seems odd and vague enough to not worry about a vague categorization.
NCIT:C79874 Conjunction Yes, this seems odd and vague enough to not worry about a vague categorization.
NCIT:C95090 Food and Water Consumption Domain Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C0024921 Maternal Health Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C0439784 Retrograde direction Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C0871483 Teacher Effectiveness Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C1521142 2: 215499872-215418783 Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C1707634 Data Element Relationship Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C2346977 Beat Number Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C2827891 Millimole per Second Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C3242496 required for initiator Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C3899421 Day Times Nanogram Per Milliliter Per Milligram Per Gram Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C4543203 Effector (disposition) Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C5240149 Aldehyde dehydrogenase inhibitor (disposition) Yes, this seems odd and vague enough to not worry about a vague categorization.
UMLS:C5708926 Study Endpoint Level Yes, this seems odd and vague enough to not worry about a vague categorization.

New Category: biolink:DiseaseOrPhenotypicFeature, Old Category: biolink:NamedThing

Node Id Node Name Reasonable?
UMLS:C2184149 living situation No, this doesn't seem like a disease or phenotype.
OMIM:MTHU003694 Compensatory chin elevation Yes
OMIM:MTHU007422 Atrophic, patchy alopecia (vertex) Yes
OMIM:MTHU010943 Normal eosinophil peroxidase activity Yes
OMIM:MTHU014349 Early ossification of capital femoral epiphyses (infancy) Yes
OMIM:MTHU018373 Average birth length, 52.6cm yes
OMIM:MTHU022090 Hypoplastic or dysplastic toes (3rd, 4th, and 5th) Yes
OMIM:MTHU025931 Diffuse deposition of calcium oxalate in various tissues Yes
OMIM:MTHU033168 Glycogen-containing cytosolic vacuoles within cardiomyocytes Yes
OMIM:MTHU038227 Hypomineralized enamel Yes
OMIM:MTHU042577 Lack of testes Yes
OMIM:MTHU047008 White spongy plaques on the pharyngeal mucosa Yes
OMIM:MTHU051409 Round-ended femur bones Yes
OMIM:MTHU055656 Short broad ribs Yes
OMIM:MTHU059510 Fusion of the cochlea and vestibule into a common cavity Yes
OMIM:MTHU063143 Impaired dental growth Yes
OMIM:MTHU066976 Fractures of the long bones Yes
OMIM:MTHU070683 Reduced scotopic and/or photopic responses on electroretinography yes
OMIM:MTHU072832 Ovoid middle phalanges Yes
OMIM:MTHU074960 Lobar disorganization Yes
UMLS:C2937320 effect on the heart valve No, this one doesn't make much sense.

New Category: biolink:DiseaseOrPhenotypicFeature, Old Category: biolink:PhenotypicFeature

Node Id Node Name Reasonable?
HP:0000002 Abnormality of body height Yes
HP:0004552 Scarring alopecia of scalp Yes
HP:0008967 Exercise-induced muscle stiffness Yes, though I'm not sure if this is more of a "symptom" than a "disease or phenotypic feature"
HP:0025131 Finger swelling Yes, though I'm not sure if this is more of a "symptom" than a "disease or phenotypic feature"
HP:0032193 Decreased low-density lipoprotein particle size Yes
HP:0045047 HbS hemoglobin Yes
NCIT:C121557 Bloody Discharge Yes, though I'm not sure if this is more of a "symptom" than a "disease or phenotypic feature"
NCIT:C135068 Duodenum and Ampulla of Vater Neuroendocrine Tumor pT2 TNM Finding v8 Yes
NCIT:C140682 Retinoblastoma Clinical Primary Tumor TNM Finding v8 Yes
NCIT:C144173 Grade 1 Hypertriglyceridemia, CTCAE Yes
NCIT:C145500 Grade 3 Osteonecrosis, CTCAE Yes
NCIT:C154502 Multihormonal Immunoreactivity Present in a Single Cell Type Yes
NCIT:C186445 No Evidence of Distinct DNA Methylation Profiling Molecular Group Present Yes
NCIT:C53844 Ventricular Arrhythmia, CTCAE_3 Yes
NCIT:C55868 Grade 2 Other Ocular and Visual, CTCAE Yes
NCIT:C57100 Uterus Leak, CTCAE Yes
NCIT:C58326 Grade 4 Joint Function, CTCAE Yes, though I'm not sure if this is more of a "symptom" than a "disease or phenotypic feature"
NCIT:C59555 Grade 3 Penis Infection Documented Clinically or Microbiologically with Grade 3 or 4 Neutrophils, CTCAE Yes
NCIT:C64450 Hypopharyngeal Cancer pT2 TNM Finding v6 and v7 Yes
NCIT:C88971 Nasopharyngeal Cancer Pathologic Regional Lymph Nodes TNM Finding v7 Yes
NCIT:C99942 Coronary Artery Right Dominance Yes

New Category: biolink:NamedThing, Old Category: biolink:Gene

Node Id Node Name Reasonable?
NCIT:C101046 ATIC/ALK Fusion Gene This should be biolink:Gene.
NCIT:C104135 CDC25C wt Allele This should be biolink:Gene.
NCIT:C105605 ZEB1 Gene This should be biolink:Gene.
NCIT:C112877 IDE wt Allele This should be biolink:Gene.
NCIT:C116426 PLIN2 Gene This should be biolink:Gene.
NCIT:C124879 PDGFD wt Allele This should be biolink:Gene.
NCIT:C131765 ARHGAP22 wt Allele This should be biolink:Gene.
NCIT:C150336 TBX22 Gene This should be biolink:Gene.
NCIT:C171410 RAB5B Gene This should be biolink:Gene.
NCIT:C181976 MASP2 wt Allele This should be biolink:Gene.
NCIT:C189947 HNRNPA1 Gene This should be biolink:Gene.
NCIT:C21589 CBLB Gene This should be biolink:Gene.
NCIT:C24830 ST13 Gene This should be biolink:Gene.
NCIT:C46035 CYP2C8*7 Allele This should be biolink:Gene.
NCIT:C51233 F3 wt Allele This should be biolink:Gene.
NCIT:C52291 CDC7 wt Allele This should be biolink:Gene.
NCIT:C54464 FRAT1 wt Allele This should be biolink:Gene.
NCIT:C81746 MIR181A2 Gene This should be biolink:Gene.
NCIT:C92174 FOXD3 Gene This should be biolink:Gene.
NCIT:C97525 GPHN Gene This should be biolink:Gene.
NCIT:C99847 NCOA4/RET Fusion Gene This should be biolink:Gene.

From Steve: We should map less things to drugs (using biolink:ChemicalEntity). Map everything in DrugBank to biolink:Drug.

ecwood commented 1 year ago

Here are the sources with name inconsistencies:

{
    "UMLS": 103711,
    "HGNC": 19853,
    "MESH": 13492,
    "OMIM": 6812,
    "DRUGBANK": 1338,
    "GO": 1199,
    "NDDF": 932,
    "NCIT": 372,
    "PDQ": 367,
    "ATC": 327,
    "ICD9": 203,
    "HP": 89,
    "RXNORM": 68,
    "PSY": 29,
    "FMA": 24,
    "NCBITaxon": 6
}
ecwood commented 1 year ago

At least one of the UMLS examples is actually correct: https://identifiers.org/umls:C3341209 UMLS:C3341209 has name inconsistency: Otonyctomys ; Genus Otonyctomys http://linkedlifedata.com/resource/umls/id/C3341209 lists the correct name as "Otonyctomys".

However, for https://identifiers.org/umls:C3341248 UMLS:C3341248 has name inconsistency: Pattonomys semivillosus ; Echimys semivillosus https://identifiers.org/umls:C3341248 lists the name as "Echimys semivillosus".

Here is the extracted entry for that node:

{
  "('UMLS', 'C3341248')": {
    "names": {
      "NCBI": {
        "SCN": {
          "Y": [
            "Pattonomys semivillosus"
          ]
        },
        "SY": {
          "N": [
            "Echimys semivillosus"
          ],
          "Y": [
            "Nelomys semivillosus",
            "Pattonomys carrikeri"
          ]
        }
      },
      "SNOMEDCT_VET": {
        "FN": {
          "Y": [
            "Echimys semivillosus (organism)"
          ]
        },
        "PT": {
          "Y": [
            "Echimys semivillosus"
          ]
        }
      }
    },
    "relations": {
      "NCBI": {
        "PAR,None,None": [
          "C3982955"
        ]
      },
      "SNOMEDCT_VET": {
        "PAR,inverse_isa,N": [
          "C1003311"
        ]
      }
    },
    "tuis": [
      "T015"
    ]
  }
}

NCBI's SCN has the highest term preference for NCBI. Interestingly, the NCBI entry also lists Pattonomys semivillosus first:

<http://purl.bioontology.org/ontology/NCBITAXON/1567524> a owl:Class ;
        skos:prefLabel """Pattonomys semivillosus"""@en ;
        skos:notation """1567524"""^^xsd:string ;
        skos:altLabel """Echimys semivillosus"""@en , """Nelomys semivillosus"""@en , """Pattonomys carrikeri"""@en ;
        rdfs:subClassOf <http://purl.bioontology.org/ontology/NCBITAXON/1567523> ;
        <http://purl.bioontology.org/ontology/NCBITAXON/DIV> """Rodents"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/NCBITAXON/RANK> """species"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/NCBITAXON/AUTHORITY_NAME> """Echimys semivillosus (I. Geoffroy, 1838)"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/NCBITAXON/AUTHORITY_NAME> """Nelomys semivillosus I. Geoffroy, 1838"""^^xsd:string ;
        UMLS:has_cui """C3341248"""^^xsd:string ;
        UMLS:has_tui """T015"""^^xsd:string ;
        UMLS:has_sty <http://purl.bioontology.org/ontology/STY/T015> ;

We are now using the official UMLS name hierarchy, so the names should be valid.

ecwood commented 1 year ago

The fixes in c3805bb and c0fb8fa brought the "problem node" count down to 677842. Now, only 17% of the nodes have a different name or category from their previous ETL equivalent.

ecwood commented 1 year ago

The fixes in 4ce1487 brought the "problem node" count down to 666071. Now, only 16.6% of the nodes have a different name or category from their previous ETL equivalent. This wasn't a substantial fix, but it improved things slightly.

ecwood commented 1 year ago

With the previous few commits (c3805bb, c0fb8fa, 4ce1487, 91e4e5e, 29a4f71, 54e8cdc, and 3a993ed), here is the updated set of inconsistencies:

biolink:Polypeptide---biolink:NamedThing: 111645
biolink:GrossAnatomicalStructure---biolink:AnatomicalEntity: 75418
biolink:Protein---biolink:Polypeptide: 60804
biolink:Gene---biolink:BiologicalEntity: 49759
biolink:NamedThing---biolink:BiologicalEntity: 35939
biolink:DiseaseOrPhenotypicFeature---biolink:NamedThing: 35529
biolink:Drug---biolink:NamedThing: 33733
biolink:NamedThing---biolink:InformationContentEntity: 33432
biolink:Drug---biolink:ChemicalEntity: 32408
biolink:DiseaseOrPhenotypicFeature---biolink:PhenotypicFeature: 24574
biolink:Publication---biolink:ClinicalIntervention: 14572
biolink:Disease---biolink:PhenotypicFeature: 13273
biolink:Disease---biolink:NamedThing: 8606
biolink:Polypeptide---biolink:BiologicalEntity: 8249
biolink:Procedure---biolink:ClinicalIntervention: 8110
biolink:ChemicalEntity---biolink:Drug: 5750
biolink:NucleicAcidEntity---biolink:ChemicalEntity: 4848
biolink:Polypeptide---biolink:Protein: 4543
biolink:Procedure---biolink:Treatment: 4169
biolink:Device---biolink:NamedThing: 2943
biolink:CellularComponent---biolink:AnatomicalEntity: 2555
biolink:DiseaseOrPhenotypicFeature---biolink:InformationContentEntity: 2459
biolink:Device---biolink:Procedure: 2180
biolink:ChemicalEntity---biolink:NamedThing: 2059
biolink:Cell---biolink:AnatomicalEntity: 1753
biolink:PathologicalProcess---biolink:PhenotypicFeature: 1592
biolink:Publication---biolink:InformationContentEntity: 1572
biolink:Device---biolink:Drug: 1501
biolink:Phenomenon---biolink:PhenotypicFeature: 1452
biolink:SmallMolecule---biolink:ChemicalEntity: 1345
biolink:NamedThing---biolink:AnatomicalEntity: 1242
biolink:PathologicalProcess---biolink:NamedThing: 1169
biolink:Drug---biolink:Procedure: 1094
biolink:Activity---biolink:Procedure: 1060
biolink:NamedThing---biolink:Activity: 985
biolink:Cell---biolink:Drug: 849
biolink:DiseaseOrPhenotypicFeature---biolink:Procedure: 804
biolink:Protein---biolink:BiologicalEntity: 761
biolink:PhenotypicFeature---biolink:NamedThing: 759
biolink:PhysiologicalProcess---biolink:BiologicalProcess: 645
biolink:DiseaseOrPhenotypicFeature---biolink:ClinicalIntervention: 641
biolink:Agent---biolink:NamedThing: 622
biolink:Protein---biolink:NamedThing: 600
biolink:Polypeptide---biolink:Drug: 580
biolink:Procedure---biolink:NamedThing: 544
biolink:Procedure---biolink:Activity: 502
biolink:NamedThing---biolink:Agent: 490
biolink:Activity---biolink:Agent: 488
biolink:Behavior---biolink:Activity: 407
biolink:PathologicalProcess---biolink:Disease: 381
biolink:NamedThing---biolink:Pathway: 331
biolink:MolecularActivity---biolink:Pathway: 313
biolink:Procedure---biolink:Agent: 290
biolink:NamedThing---biolink:PhenotypicFeature: 280
biolink:NucleicAcidEntity---biolink:NoncodingRNAProduct: 276
biolink:OrganismTaxon---biolink:Drug: 244
biolink:MolecularActivity---biolink:BiologicalProcess: 243
biolink:DiseaseOrPhenotypicFeature---biolink:Activity: 241
biolink:SmallMolecule---biolink:Drug: 237
biolink:Phenomenon---biolink:InformationContentEntity: 226
biolink:NamedThing---biolink:ClinicalIntervention: 223
biolink:Publication---biolink:NamedThing: 221
biolink:Publication---biolink:Activity: 208
biolink:NamedThing---biolink:Drug: 185
biolink:Activity---biolink:ClinicalIntervention: 179
biolink:Phenomenon---biolink:NamedThing: 177
biolink:Disease---biolink:BiologicalEntity: 169
biolink:NamedThing---biolink:Procedure: 168
biolink:Activity---biolink:NamedThing: 168
biolink:Food---biolink:NamedThing: 161
biolink:DiseaseOrPhenotypicFeature---biolink:Disease: 154
biolink:PhysiologicalProcess---biolink:NamedThing: 152
biolink:AnatomicalEntity---biolink:MaterialSample: 142
biolink:Cohort---biolink:Agent: 136
biolink:Cell---biolink:MaterialSample: 134
biolink:Polypeptide---biolink:ProteinDomain: 133
biolink:PhysicalEntity---biolink:NamedThing: 130
biolink:Behavior---biolink:Phenomenon: 127
biolink:NucleicAcidEntity---biolink:Drug: 126
biolink:ChemicalEntity---biolink:BiologicalEntity: 125
biolink:NucleicAcidEntity---biolink:AnatomicalEntity: 123
biolink:Food---biolink:ChemicalEntity: 119
biolink:Behavior---biolink:NamedThing: 119
biolink:PhysiologicalProcess---biolink:Pathway: 116
biolink:Food---biolink:Drug: 116
biolink:GrossAnatomicalStructure---biolink:NamedThing: 112
biolink:Drug---biolink:BiologicalEntity: 106
biolink:PhysiologicalProcess---biolink:PhenotypicFeature: 104
biolink:ChemicalEntity---biolink:Procedure: 103
biolink:Publication---biolink:Agent: 99
biolink:Polypeptide---biolink:ProteinFamily: 99
biolink:Food---biolink:BiologicalEntity: 93
biolink:Drug---biolink:OrganismTaxon: 93
biolink:AnatomicalEntity---biolink:InformationContentEntity: 85
biolink:Drug---biolink:Protein: 84
biolink:Protein---biolink:ProteinFamily: 79
biolink:Protein---biolink:MolecularEntity: 74
biolink:Drug---biolink:Cell: 71
biolink:Behavior---biolink:BiologicalProcess: 71
biolink:Phenomenon---biolink:Activity: 70
biolink:Disease---biolink:Activity: 70
biolink:Publication---biolink:Procedure: 69
biolink:Phenomenon---biolink:Agent: 69
biolink:NamedThing---biolink:MaterialSample: 68
biolink:Disease---biolink:PathologicalProcess: 66
biolink:ChemicalEntity---biolink:MaterialSample: 65
biolink:AnatomicalEntity---biolink:NamedThing: 65
biolink:Food---biolink:InformationContentEntity: 64
biolink:NucleicAcidEntity---biolink:NamedThing: 63
biolink:Drug---biolink:ChemicalMixture: 63
biolink:DiseaseOrPhenotypicFeature---biolink:Agent: 63
biolink:Disease---biolink:DiseaseOrPhenotypicFeature: 63
biolink:Device---biolink:MaterialSample: 61
biolink:Polypeptide---biolink:MolecularEntity: 60
biolink:PhysicalEntity---biolink:Procedure: 60
biolink:Device---biolink:Agent: 56
biolink:PopulationOfIndividualOrganisms---biolink:Agent: 53
biolink:PhysiologicalProcess---biolink:Procedure: 53
biolink:Cohort---biolink:NamedThing: 52
biolink:Behavior---biolink:Agent: 52
biolink:PhysicalEntity---biolink:Device: 49
biolink:GrossAnatomicalStructure---biolink:MaterialSample: 49
biolink:MolecularActivity---biolink:NamedThing: 47
biolink:PhysiologicalProcess---biolink:Phenomenon: 46
biolink:Cell---biolink:NamedThing: 45
biolink:DiseaseOrPhenotypicFeature---biolink:AnatomicalEntity: 44
biolink:Polypeptide---biolink:AnatomicalEntity: 43
biolink:Disease---biolink:AnatomicalEntity: 42
biolink:Protein---biolink:Drug: 41
biolink:Phenomenon---biolink:BiologicalProcess: 40
biolink:GrossAnatomicalStructure---biolink:PhenotypicFeature: 40
biolink:Drug---biolink:AnatomicalEntity: 40
biolink:Event---biolink:InformationContentEntity: 39
biolink:IndividualOrganism---biolink:OrganismTaxon: 38
biolink:OrganismTaxon---biolink:NamedThing: 37
biolink:Cohort---biolink:Procedure: 35
biolink:PopulationOfIndividualOrganisms---biolink:NamedThing: 34
biolink:Phenomenon---biolink:Procedure: 34
biolink:Food---biolink:Procedure: 34
biolink:GrossAnatomicalStructure---biolink:Procedure: 33
biolink:Polypeptide---biolink:PhenotypicFeature: 32
biolink:PhysiologicalProcess---biolink:Activity: 32
biolink:Publication---biolink:AnatomicalEntity: 30
biolink:PhenotypicFeature---biolink:Phenomenon: 30
biolink:NamedThing---biolink:Phenomenon: 30
biolink:Disease---biolink:InformationContentEntity: 29
biolink:NamedThing---biolink:Disease: 28
biolink:Drug---biolink:Polypeptide: 28
biolink:Disease---biolink:Procedure: 28
biolink:ChemicalEntity---biolink:AnatomicalEntity: 28
biolink:Behavior---biolink:InformationContentEntity: 28
biolink:Publication---biolink:PhenotypicFeature: 27
biolink:PathologicalProcess---biolink:Agent: 27
biolink:Activity---biolink:InformationContentEntity: 26
biolink:PhysicalEntity---biolink:Drug: 25
biolink:Procedure---biolink:InformationContentEntity: 24
biolink:GrossAnatomicalStructure---biolink:InformationContentEntity: 24
biolink:Behavior---biolink:PhenotypicFeature: 24
biolink:AnatomicalEntity---biolink:PhenotypicFeature: 23
biolink:PhysicalEntity---biolink:Agent: 22
biolink:NamedThing---biolink:OrganismTaxon: 22
biolink:CellularComponent---biolink:BiologicalEntity: 21
biolink:Publication---biolink:MaterialSample: 20
biolink:PhenotypicFeature---biolink:DiseaseOrPhenotypicFeature: 20
biolink:PathologicalProcess---biolink:BiologicalEntity: 20
biolink:NucleicAcidEntity---biolink:BiologicalEntity: 20
biolink:IndividualOrganism---biolink:NamedThing: 20
biolink:CellularComponent---biolink:Protein: 20
biolink:Procedure---biolink:PhenotypicFeature: 18
biolink:NamedThing---biolink:Protein: 18
biolink:Disease---biolink:BiologicalProcess: 18
biolink:ChemicalEntity---biolink:PhenotypicFeature: 18
biolink:PathologicalProcess---biolink:InformationContentEntity: 17
biolink:NucleicAcidEntity---biolink:MaterialSample: 17
biolink:DiseaseOrPhenotypicFeature---biolink:Phenomenon: 17
biolink:Disease---biolink:Phenomenon: 17
biolink:PhysiologicalProcess---biolink:InformationContentEntity: 16
biolink:PhenotypicFeature---biolink:Disease: 16
biolink:PathologicalProcess---biolink:Procedure: 16
biolink:Drug---biolink:SmallMolecule: 16
biolink:CellularComponent---biolink:NamedThing: 16
biolink:Publication---biolink:Phenomenon: 15
biolink:Procedure---biolink:AnatomicalEntity: 15
biolink:PhysiologicalProcess---biolink:PathologicalProcess: 15
biolink:PhysiologicalProcess---biolink:BiologicalEntity: 15
biolink:DiseaseOrPhenotypicFeature---biolink:BiologicalProcess: 15
biolink:Cell---biolink:InformationContentEntity: 15
biolink:NucleicAcidEntity---biolink:OrganismTaxon: 14
biolink:Drug---biolink:Treatment: 14
biolink:Drug---biolink:NucleicAcidEntity: 14
biolink:Disease---biolink:Agent: 14
biolink:ChemicalEntity---biolink:OrganismTaxon: 14
biolink:AnatomicalEntity---biolink:Procedure: 14
biolink:Activity---biolink:Phenomenon: 14
biolink:Drug---biolink:MaterialSample: 13
biolink:PhysiologicalProcess---biolink:ClinicalIntervention: 12
biolink:OrganismTaxon---biolink:BiologicalEntity: 12
biolink:Cohort---biolink:Activity: 12
biolink:PhysiologicalProcess---biolink:BiologicalProcessOrActivity: 11
biolink:PhysicalEntity---biolink:MaterialSample: 11
biolink:Phenomenon---biolink:ClinicalIntervention: 11
biolink:NamedThing---biolink:BiologicalProcess: 11
biolink:IndividualOrganism---biolink:Agent: 11
biolink:IndividualOrganism---biolink:Activity: 11
biolink:ChemicalEntity---biolink:Agent: 11
biolink:PathologicalProcess---biolink:BiologicalProcess: 10
biolink:NamedThing---biolink:ProteinDomain: 10
biolink:Drug---biolink:MolecularEntity: 10
biolink:DiseaseOrPhenotypicFeature---biolink:BiologicalEntity: 10
biolink:ChemicalEntity---biolink:Protein: 10
biolink:ChemicalEntity---biolink:Device: 10
biolink:CellularComponent---biolink:PhenotypicFeature: 10
biolink:Agent---biolink:Activity: 10
biolink:SmallMolecule---biolink:MolecularEntity: 9
biolink:PopulationOfIndividualOrganisms---biolink:InformationContentEntity: 9
biolink:NucleicAcidEntity---biolink:PhenotypicFeature: 9
biolink:Disease---biolink:Cell: 9
biolink:Cohort---biolink:InformationContentEntity: 9
biolink:Procedure---biolink:Drug: 8
biolink:Phenomenon---biolink:PathologicalProcess: 8
biolink:NamedThing---biolink:Gene: 8
biolink:IndividualOrganism---biolink:Procedure: 8
biolink:DiseaseOrPhenotypicFeature---biolink:Cell: 8
biolink:Device---biolink:ClinicalIntervention: 8
biolink:ChemicalEntity---biolink:InformationContentEntity: 8
biolink:PhysiologicalProcess---biolink:AnatomicalEntity: 7
biolink:PhenotypicFeature---biolink:Activity: 7
biolink:Drug---biolink:ClinicalIntervention: 7
biolink:Disease---biolink:MaterialSample: 7
biolink:SmallMolecule---biolink:NamedThing: 6
biolink:PhenotypicFeature---biolink:InformationContentEntity: 6
biolink:OrganismTaxon---biolink:Agent: 6
biolink:NucleicAcidEntity---biolink:Transcript: 6
biolink:NucleicAcidEntity---biolink:Gene: 6
biolink:Gene---biolink:Drug: 6
biolink:Drug---biolink:InformationContentEntity: 6
biolink:Cohort---biolink:PopulationOfIndividualOrganisms: 6
biolink:Behavior---biolink:Procedure: 6
biolink:Agent---biolink:Procedure: 6
biolink:SmallMolecule---biolink:InformationContentEntity: 5
biolink:Publication---biolink:BiologicalProcess: 5
biolink:Protein---biolink:AnatomicalEntity: 5
biolink:PhysiologicalProcess---biolink:Disease: 5
biolink:PhysicalEntity---biolink:InformationContentEntity: 5
biolink:PhenotypicFeature---biolink:Procedure: 5
biolink:PathologicalProcess---biolink:AnatomicalEntity: 5
biolink:IndividualOrganism---biolink:Phenomenon: 5
biolink:GrossAnatomicalStructure---biolink:Drug: 5
biolink:GrossAnatomicalStructure---biolink:BiologicalEntity: 5
biolink:GeographicLocation---biolink:NamedThing: 5
biolink:GeographicLocation---biolink:InformationContentEntity: 5
biolink:CellularComponent---biolink:MaterialSample: 5
biolink:CellularComponent---biolink:InformationContentEntity: 5
biolink:Cell---biolink:BiologicalEntity: 5
biolink:AnatomicalEntity---biolink:ClinicalIntervention: 5
biolink:AnatomicalEntity---biolink:Agent: 5
biolink:Procedure---biolink:Phenomenon: 4
biolink:Procedure---biolink:MaterialSample: 4
biolink:Procedure---biolink:BiologicalProcess: 4
biolink:Polypeptide---biolink:NoncodingRNAProduct: 4
biolink:PhenotypicFeature---biolink:ClinicalIntervention: 4
biolink:PhenotypicFeature---biolink:Agent: 4
biolink:Phenomenon---biolink:AnatomicalEntity: 4
biolink:PathologicalProcess---biolink:ClinicalIntervention: 4
biolink:NamedThing---biolink:PathologicalProcess: 4
biolink:NamedThing---biolink:Device: 4
biolink:MolecularActivity---biolink:PhenotypicFeature: 4
biolink:MolecularActivity---biolink:PathologicalProcess: 4
biolink:GrossAnatomicalStructure---biolink:OrganismTaxon: 4
biolink:GrossAnatomicalStructure---biolink:Device: 4
biolink:GeographicLocation---biolink:Agent: 4
biolink:Event---biolink:Activity: 4
biolink:DiseaseOrPhenotypicFeature---biolink:PathologicalProcess: 4
biolink:CellularComponent---biolink:ProteinDomain: 4
biolink:Cell---biolink:PhenotypicFeature: 4
biolink:Behavior---biolink:Disease: 4
biolink:AnatomicalEntity---biolink:Device: 4
biolink:Publication---biolink:OrganismTaxon: 3
biolink:PopulationOfIndividualOrganisms---biolink:Procedure: 3
biolink:PopulationOfIndividualOrganisms---biolink:Phenomenon: 3
biolink:PopulationOfIndividualOrganisms---biolink:Activity: 3
biolink:Phenomenon---biolink:Pathway: 3
biolink:MolecularActivity---biolink:Procedure: 3
biolink:IndividualOrganism---biolink:InformationContentEntity: 3
biolink:Gene---biolink:NamedThing: 3
biolink:Food---biolink:ClinicalIntervention: 3
biolink:Device---biolink:AnatomicalEntity: 3
biolink:ChemicalEntity---biolink:Activity: 3
biolink:Behavior---biolink:ClinicalIntervention: 3
biolink:Behavior---biolink:BiologicalEntity: 3
biolink:AnatomicalEntity---biolink:BiologicalEntity: 3
biolink:Activity---biolink:BiologicalEntity: 3
biolink:SmallMolecule---biolink:MaterialSample: 2
biolink:Publication---biolink:BiologicalEntity: 2
biolink:Protein---biolink:PhenotypicFeature: 2
biolink:PopulationOfIndividualOrganisms---biolink:BiologicalEntity: 2
biolink:Polypeptide---biolink:Treatment: 2
biolink:PhysicalEntity---biolink:Phenomenon: 2
biolink:PhenotypicFeature---biolink:AnatomicalEntity: 2
biolink:Phenomenon---biolink:Disease: 2
biolink:PathologicalProcess---biolink:Activity: 2
biolink:OrganismTaxon---biolink:Activity: 2
biolink:NucleicAcidEntity---biolink:Protein: 2
biolink:NucleicAcidEntity---biolink:InformationContentEntity: 2
biolink:MolecularActivity---biolink:Protein: 2
biolink:MolecularActivity---biolink:InformationContentEntity: 2
biolink:MolecularActivity---biolink:ClinicalIntervention: 2
biolink:MolecularActivity---biolink:Activity: 2
biolink:Food---biolink:OrganismTaxon: 2
biolink:Drug---biolink:ProteinFamily: 2
biolink:Drug---biolink:PhenotypicFeature: 2
biolink:Drug---biolink:Gene: 2
biolink:Drug---biolink:Device: 2
biolink:Device---biolink:InformationContentEntity: 2
biolink:CellularComponent---biolink:BiologicalProcess: 2
biolink:Agent---biolink:InformationContentEntity: 2
biolink:Agent---biolink:ClinicalIntervention: 2
biolink:Activity---biolink:PhenotypicFeature: 2
biolink:Activity---biolink:BiologicalProcess: 2
biolink:Protein---biolink:ProteinDomain: 1
biolink:Protein---biolink:NoncodingRNAProduct: 1
biolink:Protein---biolink:Gene: 1
biolink:Procedure---biolink:OrganismTaxon: 1
biolink:Procedure---biolink:Device: 1
biolink:PopulationOfIndividualOrganisms---biolink:OrganismalEntity: 1
biolink:PopulationOfIndividualOrganisms---biolink:OrganismTaxon: 1
biolink:Polypeptide---biolink:Procedure: 1
biolink:Polypeptide---biolink:Device: 1
biolink:PhysiologicalProcess---biolink:Agent: 1
biolink:PhenotypicFeature---biolink:BiologicalProcess: 1
biolink:PhenotypicFeature---biolink:BiologicalEntity: 1
biolink:Phenomenon---biolink:OrganismTaxon: 1
biolink:Phenomenon---biolink:MaterialSample: 1
biolink:Phenomenon---biolink:Device: 1
biolink:Phenomenon---biolink:BiologicalEntity: 1
biolink:PathologicalProcess---biolink:Phenomenon: 1
biolink:PathologicalProcess---biolink:Pathway: 1
biolink:PathologicalProcess---biolink:BiologicalProcessOrActivity: 1
biolink:OrganismTaxon---biolink:Procedure: 1
biolink:OrganismTaxon---biolink:InformationContentEntity: 1
biolink:NucleicAcidEntity---biolink:MolecularEntity: 1
biolink:NamedThing---biolink:Treatment: 1
biolink:NamedThing---biolink:NoncodingRNAProduct: 1
biolink:NamedThing---biolink:MolecularEntity: 1
biolink:NamedThing---biolink:GeographicLocation: 1
biolink:NamedThing---biolink:ChemicalEntity: 1
biolink:MolecularActivity---biolink:ProteinDomain: 1
biolink:IndividualOrganism---biolink:AnatomicalEntity: 1
biolink:GrossAnatomicalStructure---biolink:Gene: 1
biolink:GrossAnatomicalStructure---biolink:Disease: 1
biolink:GrossAnatomicalStructure---biolink:ClinicalIntervention: 1
biolink:GrossAnatomicalStructure---biolink:Agent: 1
biolink:GeographicLocation---biolink:BiologicalEntity: 1
biolink:Gene---biolink:ProteinFamily: 1
biolink:Gene---biolink:PhenotypicFeature: 1
biolink:Gene---biolink:AnatomicalEntity: 1
biolink:Event---biolink:PhenotypicFeature: 1
biolink:Event---biolink:Agent: 1
biolink:Drug---biolink:NoncodingRNAProduct: 1
biolink:Drug---biolink:GrossAnatomicalStructure: 1
biolink:Drug---biolink:Food: 1
biolink:Drug---biolink:Disease: 1
biolink:Drug---biolink:Activity: 1
biolink:DiseaseOrPhenotypicFeature---biolink:OrganismTaxon: 1
biolink:DiseaseOrPhenotypicFeature---biolink:Drug: 1
biolink:Disease---biolink:Device: 1
biolink:Disease---biolink:ClinicalIntervention: 1
biolink:Device---biolink:PhenotypicFeature: 1
biolink:Device---biolink:Phenomenon: 1
biolink:Device---biolink:Activity: 1
biolink:Cohort---biolink:ClinicalIntervention: 1
biolink:Cohort---biolink:BiologicalEntity: 1
biolink:Cohort---biolink:AnatomicalEntity: 1
biolink:ChemicalEntity---biolink:Gene: 1
biolink:ChemicalEntity---biolink:ClinicalIntervention: 1
biolink:CellularComponent---biolink:MolecularEntity: 1
biolink:CellularComponent---biolink:Activity: 1
biolink:AnatomicalEntity---biolink:Drug: 1
biolink:Agent---biolink:PhysicalEntity: 1
biolink:Agent---biolink:OrganismTaxon: 1
biolink:Agent---biolink:Device: 1
biolink:Activity---biolink:Device: 1
biolink:Activity---biolink:AnatomicalEntity: 1
ecwood commented 1 year ago

When excluding cases where the old category is an abstract class (biolink:BiologicalEntity and biological:InformationContentEntity, there are 523543 inconsistent nodes (out of 40095349), putting it at around 13%. It seems reasonable to continue at this point. We can resolve any category/name issues if they arise.

ecwood commented 1 year ago

The next step in verification is to look at edge coverage and description accuracy. @acevedol is taking on this task.

The next two steps in this ETL are creating the source nodes and making making update_date no longer a hard-coded parameter.

Additionally, we need to document how to add UMLS sources into the ETL. I also need to thoroughly document how both the extraction and conversion for this ETL work. I will also be considering moving the UMLS_Processor class that I currently have in umls_util.py back into umls_list_to_kg_jsonl.py. Since almost everything has been factored into that class, umls_list_to_kg_jsonl.py is actually pretty minimal. It might minimize later confusion by combining the two files.

Finally, this version of the ETL needs to be worked into the Snakemake build system as well as stripping the UMLS build out of the ontology build. This should have a time saving effect, since the ontology build (which is usually the last to finish) won't have as many sources and can run in parallel with the other extractions. Additionally, this new, streaming based conversion only takes around 27 minutes (and with the extraction from MySQL only taking around 50 minutes and UMLS to RDF no longer needing to be run (which should in turn address #336)).

ecwood commented 1 year ago

Edges Update:

Source predicate curie is missing from the YAML config file: MEDLINEPLUS:PAR
Source predicate curie is missing from the YAML config file: NCI:has_data_element
Source predicate curie is missing from the YAML config file: PDQ:has_tradename
Source predicate curie is missing from the YAML config file: NCI:is_structural_domain_or_motif_of_gene_product
Source predicate curie is missing from the YAML config file: NCI:gene_associated_with_disease
Source predicate curie is missing from the YAML config file: NCI:ctcae_5_parent_of
Source predicate curie is missing from the YAML config file: GO:mth_expanded_form_of
Source predicate curie is missing from the YAML config file: HPO:SY
Source predicate curie is missing from the YAML config file: PSY:PAR
Source predicate curie is missing from the YAML config file: NCI:disease_may_have_molecular_abnormality
Source predicate curie is missing from the YAML config file: NCI:gene_mapped_to_disease
Source predicate curie is missing from the YAML config file: NCI:gene_mutant_encodes_gene_product_sequence_variation
Source predicate curie is missing from the YAML config file: NCI:has_pharmaceutical_administration_method
Source predicate curie is missing from the YAML config file: MED-RT:induces
Source predicate curie is missing from the YAML config file: GO:RN
Source predicate curie is missing from the YAML config file: MEDLINEPLUS:SY
Source predicate curie is missing from the YAML config file: MEDLINEPLUS:CHD
Source predicate curie is missing from the YAML config file: NCI:abnormality_associated_with_allele
Source predicate curie is missing from the YAML config file: NCI:has_seronet_permissible_value
Source predicate curie is missing from the YAML config file: HL7V3.0:classifies_class_code
Source predicate curie is missing from the YAML config file: NCI:has_gene_product_element
Source predicate curie is missing from the YAML config file: NCI:may_be_normal_cell_origin_of_disease
Source predicate curie is missing from the YAML config file: NCI:chemotherapy_regimen_has_component
Source predicate curie is missing from the YAML config file: NCI:is_not_normal_tissue_origin_of_disease
Source predicate curie is missing from the YAML config file: NCI:biological_process_has_initiator_chemical_or_drug
Source predicate curie is missing from the YAML config file: NCI:procedure_has_partially_excised_anatomy
Source predicate curie is missing from the YAML config file: NCI:is_ctdc_value_of
Source predicate curie is missing from the YAML config file: NCI:process_initiates_biological_process
Source predicate curie is missing from the YAML config file: MEDLINEPLUS:RQ
Source predicate curie is missing from the YAML config file: NCI:is_associated_anatomy_of_gene_product
Source predicate curie is missing from the YAML config file: NCI:anatomic_structure_is_physical_part_of
Source predicate curie is missing from the YAML config file: NCI:process_includes_biological_process
Source predicate curie is missing from the YAML config file: PDQ:tradename_of
Source predicate curie is missing from the YAML config file: MED-RT:has_contraindicated_drug
Source predicate curie is missing from the YAML config file: GO:SY
Source predicate curie is missing from the YAML config file: NCI:completely_excised_anatomy_may_have_procedure
Source predicate curie is missing from the YAML config file: NCI:chemical_or_drug_is_product_of_biological_process
Source predicate curie is missing from the YAML config file: NCI:eo_anatomy_is_associated_with_eo_disease
Source predicate curie is missing from the YAML config file: ICD10PCS:PAR
Source predicate curie is missing from the YAML config file: NCI:biological_process_involves_chemical_or_drug
Source predicate curie is missing from the YAML config file: HL7V3.0:component_of
Source predicate curie is missing from the YAML config file: HL7V3.0:larger_than
Source predicate curie is missing from the YAML config file: FMA:isa
Source predicate curie is missing from the YAML config file: MED-RT:may_be_prevented_by
Source predicate curie is missing from the YAML config file: HGNC:expanded_form_of
Source predicate curie is missing from the YAML config file: NCI:qualifier_applies_to
Source predicate curie is missing from the YAML config file: MSH:PAR
Source predicate curie is missing from the YAML config file: NCI:activity_of_allele
Source predicate curie is missing from the YAML config file: ATC:inverse_isa
Source predicate curie is missing from the YAML config file: NCI:is_chromosomal_location_of_gene
Source predicate curie is missing from the YAML config file: NCI:gene_product_has_associated_anatomy
Source predicate curie is missing from the YAML config file: NCI:gene_in_chromosomal_location
Source predicate curie is missing from the YAML config file: HL7V3.0:has_owning_section
Source predicate curie is missing from the YAML config file: HL7V3.0:context_binding_of
Source predicate curie is missing from the YAML config file: HL7V3.0:has_owning_affiliate
Source predicate curie is missing from the YAML config file: NCI:chromosomal_location_of_allele
Source predicate curie is missing from the YAML config file: NCI:gene_is_biomarker_type
Source predicate curie is missing from the YAML config file: NCI:may_be_molecular_abnormality_of_disease
Source predicate curie is missing from the YAML config file: MED-RT:active_metabolites_of
Source predicate curie is missing from the YAML config file: MED-RT:may_inhibit_effect_of
Source predicate curie is missing from the YAML config file: NCI:has_conceptual_part
Source predicate curie is missing from the YAML config file: NCI:is_qualified_by
Source predicate curie is missing from the YAML config file: NCI:is_biochemical_function_of_gene_product
Source predicate curie is missing from the YAML config file: MED-RT:may_prevent
Source predicate curie is missing from the YAML config file: MED-RT:has_active_metabolites
Source predicate curie is missing from the YAML config file: NCI:gene_product_plays_role_in_biological_process
Source predicate curie is missing from the YAML config file: HGNC:has_alias
Source predicate curie is missing from the YAML config file: NCI:is_chemical_classification_of_gene_product
Source predicate curie is missing from the YAML config file: MTH:measured_by
Source predicate curie is missing from the YAML config file: NCI:gene_product_has_organism_source
Source predicate curie is missing from the YAML config file: NCI:disease_may_have_finding
Source predicate curie is missing from the YAML config file: HGNC:prev_name_of
Source predicate curie is missing from the YAML config file: NCI:allele_absent_from_wild-type_chromosomal_location
Source predicate curie is missing from the YAML config file: NCI:is_primary_anatomic_site_of_disease
Source predicate curie is missing from the YAML config file: MTH:form_of
Source predicate curie is missing from the YAML config file: HL7V3.0:PAR
Source predicate curie is missing from the YAML config file: NCI:subset_includes_concept
Source predicate curie is missing from the YAML config file: HL7V3.0:has_context_binding
Source predicate curie is missing from the YAML config file: PDQ:SY
Source predicate curie is missing from the YAML config file: HGNC:has_prev_name
Source predicate curie is missing from the YAML config file: HGNC:has_prev_symbol
Source predicate curie is missing from the YAML config file: PDQ:inverse_isa
Source predicate curie is missing from the YAML config file: NCI:is_cytogenetic_abnormality_of_disease
Source predicate curie is missing from the YAML config file: NCI:gene_is_biomarker_of
Source predicate curie is missing from the YAML config file: MTH:RO
Source predicate curie is missing from the YAML config file: NCI:is_associated_anatomic_site_of
Source predicate curie is missing from the YAML config file: NCI:inc_parent_of
Source predicate curie is missing from the YAML config file: NCI:disease_has_metastatic_anatomic_site
Source predicate curie is missing from the YAML config file: NCI:disease_may_have_cytogenetic_abnormality
Source predicate curie is missing from the YAML config file: NCI:gene_product_malfunction_associated_with_disease
Source predicate curie is missing from the YAML config file: NCI:is_target
Source predicate curie is missing from the YAML config file: NCI:tissue_is_expression_site_of_gene_product
Source predicate curie is missing from the YAML config file: NCI:is_molecular_abnormality_of_disease
Source predicate curie is missing from the YAML config file: NCI:has_pcdc_ews_permissible_value
Source predicate curie is missing from the YAML config file: NCI:inverse_isa
Source predicate curie is missing from the YAML config file: NCI:procedure_may_have_excised_anatomy
Source predicate curie is missing from the YAML config file: NCI:is_normal_cell_origin_of_disease
Source predicate curie is missing from the YAML config file: MED-RT:may_treat
Source predicate curie is missing from the YAML config file: OMIM:entry_term_of
Source predicate curie is missing from the YAML config file: HL7V3.0:may_qualify
Source predicate curie is missing from the YAML config file: NCI:is_physiologic_effect_of_chemical_or_drug
Source predicate curie is missing from the YAML config file: MED-RT:physiologic_effect_of
Source predicate curie is missing from the YAML config file: NCI:biological_process_has_result_biological_process
Source predicate curie is missing from the YAML config file: NCI:gene_found_in_organism
Source predicate curie is missing from the YAML config file: NCI:procedure_has_imaged_anatomy
Source predicate curie is missing from the YAML config file: NCI:role_has_domain
Source predicate curie is missing from the YAML config file: NCI:procedure_may_have_partially_excised_anatomy
Source predicate curie is missing from the YAML config file: NCI:may_be_abnormal_cell_of_disease
Source predicate curie is missing from the YAML config file: NCI:is_grade_of_disease
Source predicate curie is missing from the YAML config file: HPO:isa
Source predicate curie is missing from the YAML config file: MED-RT:has_therapeutic_class
Source predicate curie is missing from the YAML config file: NCI:pharmaceutical_state_of_matter_of
Source predicate curie is missing from the YAML config file: NCI:has_dipg_dmg_permissible_value
Source predicate curie is missing from the YAML config file: NCI:is_stage_of_disease
Source predicate curie is missing from the YAML config file: NCI:pharmaceutical_release_characteristics_of
Source predicate curie is missing from the YAML config file: NCI:gene_involved_in_molecular_abnormality
Source predicate curie is missing from the YAML config file: PDQ:has_lab_number
Source predicate curie is missing from the YAML config file: NCI:anatomy_originated_from_biological_process
Source predicate curie is missing from the YAML config file: NCI:is_pcdc_all_permissible_value_for_variable
Source predicate curie is missing from the YAML config file: MSH:RO
Source predicate curie is missing from the YAML config file: MED-RT:contraindicated_physiologic_effect_of
Source predicate curie is missing from the YAML config file: NCI:excised_anatomy_may_have_procedure
Source predicate curie is missing from the YAML config file: NCI:has_pharmaceutical_transformation
Source predicate curie is missing from the YAML config file: NCI:is_paired_with_value_set
Source predicate curie is missing from the YAML config file: HL7V3.0:smaller_than
Source predicate curie is missing from the YAML config file: NCI:gene_product_sequence_variation_encoded_by_gene_mutant
Source predicate curie is missing from the YAML config file: NCI:disease_may_have_normal_cell_origin
Source predicate curie is missing from the YAML config file: NCI:is_organism_source_of_gene_product
Source predicate curie is missing from the YAML config file: NCI:is_location_of_anatomic_structure
Source predicate curie is missing from the YAML config file: NCI:chromosome_involved_in_cytogenetic_abnormality
Source predicate curie is missing from the YAML config file: NCI:may_be_normal_tissue_origin_of_disease
Source predicate curie is missing from the YAML config file: NCI:is_property_or_attribute_of_eo_disease
Source predicate curie is missing from the YAML config file: NCI:chemical_or_drug_affects_abnormal_cell
Source predicate curie is missing from the YAML config file: NCI:chemical_or_drug_has_physiologic_effect
Source predicate curie is missing from the YAML config file: NCI:gene_has_abnormality
Source predicate curie is missing from the YAML config file: NCI:regimen_has_accepted_use_for_disease
Source predicate curie is missing from the YAML config file: NCI:molecular_abnormality_involves_gene
Source predicate curie is missing from the YAML config file: NCI:gene_product_has_chemical_classification
Source predicate curie is missing from the YAML config file: MED-RT:therapeutic_class_of
Source predicate curie is missing from the YAML config file: MED-RT:parent_of
Source predicate curie is missing from the YAML config file: NCI:value_set_is_paired_with
Source predicate curie is missing from the YAML config file: MSH:CHD
Source predicate curie is missing from the YAML config file: NCI:disease_may_have_associated_disease
Source predicate curie is missing from the YAML config file: NCI:role_is_parent_of
Source predicate curie is missing from the YAML config file: NCI:is_pcdc_gct_permissible_value_for_variable
Source predicate curie is missing from the YAML config file: RXNORM:includes
Source predicate curie is missing from the YAML config file: MSH:AQ
Source predicate curie is missing from the YAML config file: MTH:measures
Source predicate curie is missing from the YAML config file: NCI:tradename_of
Source predicate curie is missing from the YAML config file: NCI:role_has_range
Source predicate curie is missing from the YAML config file: UMLS:xref
Source predicate curie is missing from the YAML config file: NCI:has_pcdc_data_type
Source predicate curie is missing from the YAML config file: NCI:pcdc_data_type_of
Source predicate curie is missing from the YAML config file: MSH:has_permuted_term
Source predicate curie is missing from the YAML config file: MSH:QB
Source predicate curie is missing from the YAML config file: HPO:inverse_isa
Source predicate curie is missing from the YAML config file: OMIM:has_alias
Source predicate curie is missing from the YAML config file: ICD9CM:PAR
Source predicate curie is missing from the YAML config file: NCI:chemical_or_drug_plays_role_in_biological_process
Source predicate curie is missing from the YAML config file: NCI:associated_with_malfunction_of_gene_product
Source predicate curie is missing from the YAML config file: NCI:is_pcdc_aml_permissible_value_for_variable
Source predicate curie is missing from the YAML config file: MTH:RB
Source predicate curie is missing from the YAML config file: NCI:disease_has_normal_tissue_origin
Source predicate curie is missing from the YAML config file: NCI:eo_disease_maps_to_human_disease
Source predicate curie is missing from the YAML config file: NCI:disease_has_associated_disease
Source predicate curie is missing from the YAML config file: NCI:gene_product_affected_by_chemical_or_drug
Source predicate curie is missing from the YAML config file: NCI:disease_has_finding
Source predicate curie is missing from the YAML config file: NCI:pharmaceutical_intended_site_of
Source predicate curie is missing from the YAML config file: MSH:isa
Source predicate curie is missing from the YAML config file: NCI:is_not_abnormal_cell_of_disease
Source predicate curie is missing from the YAML config file: NCI:disease_mapped_to_chromosome
Source predicate curie is missing from the YAML config file: NCI:is_related_to_endogenous_product
Source predicate curie is missing from the YAML config file: HL7V3.0:supported_concept_relationship_in
Source predicate curie is missing from the YAML config file: HL7V3.0:has_supported_concept_relationship
Source predicate curie is missing from the YAML config file: NCI:data_element_of
Source predicate curie is missing from the YAML config file: NCI:has_pharmaceutical_basic_dose_form
Source predicate curie is missing from the YAML config file: MSH:inverse_isa
Source predicate curie is missing from the YAML config file: NCI:is_associated_disease_of
Source predicate curie is missing from the YAML config file: NCI:allele_in_chromosomal_location
Source predicate curie is missing from the YAML config file: NCI:gene_product_variant_of_gene_product
Source predicate curie is missing from the YAML config file: NCI:gene_product_encoded_by_gene
Source predicate curie is missing from the YAML config file: PSY:CHD
Source predicate curie is missing from the YAML config file: NCI:enzyme_metabolizes_chemical_or_drug
Source predicate curie is missing from the YAML config file: NCI:special_category_includes_neoplasm
Source predicate curie is missing from the YAML config file: HL7V3.0:owning_affiliate_of
Source predicate curie is missing from the YAML config file: ATC:member_of
Source predicate curie is missing from the YAML config file: NCBI:PAR
Source predicate curie is missing from the YAML config file: GO:RB
Source predicate curie is missing from the YAML config file: NCI:conceptual_part_of
Source predicate curie is missing from the YAML config file: NCI:gene_product_is_biomarker_type
Source predicate curie is missing from the YAML config file: MED-RT:PAR
Source predicate curie is missing from the YAML config file: NCI:partially_excised_anatomy_has_procedure
Source predicate curie is missing from the YAML config file: NCI:abnormal_cell_affected_by_chemical_or_drug
Source predicate curie is missing from the YAML config file: MED-RT:may_be_diagnosed_by
Source predicate curie is missing from the YAML config file: MSH:mapped_from
Source predicate curie is missing from the YAML config file: NCI:gene_product_is_biomarker_of
Source predicate curie is missing from the YAML config file: NCI:pharmaceutical_administration_method_of
Source predicate curie is missing from the YAML config file: NCI:has_pcdc_gct_permissible_value
Source predicate curie is missing from the YAML config file: NCI:gene_has_physical_location
Source predicate curie is missing from the YAML config file: ICD10PCS:CHD
Source predicate curie is missing from the YAML config file: NCI:disease_has_accepted_treatment_with_regimen
Source predicate curie is missing from the YAML config file: MED-RT:has_mechanism_of_action
Source predicate curie is missing from the YAML config file: HL7V3.0:class_code_classified_by
Source predicate curie is missing from the YAML config file: NCI:disease_excludes_primary_anatomic_site
Source predicate curie is missing from the YAML config file: NCI:has_tradename
Source predicate curie is missing from the YAML config file: NCI:eo_disease_has_property_or_attribute
Source predicate curie is missing from the YAML config file: NCI:is_not_cytogenetic_abnormality_of_disease
Source predicate curie is missing from the YAML config file: HL7V3.0:has_owning_subsection
Source predicate curie is missing from the YAML config file: PDQ:lab_number_of
Source predicate curie is missing from the YAML config file: NCI:complex_has_physical_part
Source predicate curie is missing from the YAML config file: FMA:inverse_isa
Source predicate curie is missing from the YAML config file: NCI:disease_has_associated_anatomic_site
Source predicate curie is missing from the YAML config file: NCI:cytogenetic_abnormality_involves_chromosome
Source predicate curie is missing from the YAML config file: NCI:procedure_has_target_anatomy
Source predicate curie is missing from the YAML config file: MED-RT:has_contraindicated_mechanism_of_action
Source predicate curie is missing from the YAML config file: NCI:is_location_of_biological_process
Source predicate curie is missing from the YAML config file: NCI:has_target
Source predicate curie is missing from the YAML config file: NCI:disease_has_molecular_abnormality
Source predicate curie is missing from the YAML config file: MED-RT:induced_by
Source predicate curie is missing from the YAML config file: NCI:icdc_value_of
Source predicate curie is missing from the YAML config file: NCI:chemical_or_drug_initiates_biological_process
Source predicate curie is missing from the YAML config file: NCI:biological_process_is_part_of_process
Source predicate curie is missing from the YAML config file: NCI:cell_type_or_tissue_affected_by_chemical_or_drug
Source predicate curie is missing from the YAML config file: NCI:is_not_metastatic_anatomic_site_of_disease
Source predicate curie is missing from the YAML config file: NCI:is_abnormal_cell_of_disease
Source predicate curie is missing from the YAML config file: NCI:completely_excised_anatomy_has_procedure
Source predicate curie is missing from the YAML config file: NCI:procedure_has_excised_anatomy
Source predicate curie is missing from the YAML config file: NCI:biological_process_has_result_anatomy
Source predicate curie is missing from the YAML config file: NCI:kind_is_domain_of
Source predicate curie is missing from the YAML config file: NCI:has_free_acid_or_base_form
Source predicate curie is missing from the YAML config file: ICD10PCS:has_expanded_form
Source predicate curie is missing from the YAML config file: NCI:allele_plays_role_in_metabolism_of_chemical_or_drug
Source predicate curie is missing from the YAML config file: NCI:has_physical_part_of_anatomic_structure
Source predicate curie is missing from the YAML config file: HL7V3.0:supported_concept_property_in
Source predicate curie is missing from the YAML config file: NCI:has_pharmaceutical_release_characteristics
Source predicate curie is missing from the YAML config file: NCI:gene_product_is_element_in_pathway
Source predicate curie is missing from the YAML config file: NCI:has_gdc_value
Source predicate curie is missing from the YAML config file: MTH:SY
Source predicate curie is missing from the YAML config file: PDQ:expanded_form_of
Source predicate curie is missing from the YAML config file: MED-RT:site_of_metabolism
Source predicate curie is missing from the YAML config file: NCI:disease_has_cytogenetic_abnormality
Source predicate curie is missing from the YAML config file: NCI:is_pcdc_hl_permissible_value_for_variable
Source predicate curie is missing from the YAML config file: NCI:disease_has_normal_cell_origin
Source predicate curie is missing from the YAML config file: NCI:gene_encodes_gene_product
Source predicate curie is missing from the YAML config file: NCI:biological_process_has_initiator_process
Source predicate curie is missing from the YAML config file: NCI:has_pcdc_aml_permissible_value
Source predicate curie is missing from the YAML config file: NCI:has_pharmaceutical_state_of_matter
Source predicate curie is missing from the YAML config file: MED-RT:contraindicated_with_disease
Source predicate curie is missing from the YAML config file: MED-RT:contraindicated_mechanism_of_action_of
Source predicate curie is missing from the YAML config file: ATC:isa
Source predicate curie is missing from the YAML config file: NCI:chemical_or_drug_has_mechanism_of_action
Source predicate curie is missing from the YAML config file: MSH:RN
Source predicate curie is missing from the YAML config file: NCI:has_ctdc_value
Source predicate curie is missing from the YAML config file: NCI:role_has_parent
Source predicate curie is missing from the YAML config file: NCI:allele_has_abnormality
Source predicate curie is missing from the YAML config file: NCI:chromosome_mapped_to_disease
Source predicate curie is missing from the YAML config file: NCI:disease_has_associated_gene
Source predicate curie is missing from the YAML config file: MED-RT:may_diagnose
Source predicate curie is missing from the YAML config file: MED-RT:has_contraindicated_physiologic_effect
Source predicate curie is missing from the YAML config file: NCI:may_be_cytogenetic_abnormality_of_disease
Source predicate curie is missing from the YAML config file: PSY:RN
Source predicate curie is missing from the YAML config file: NCI:kind_is_range_of
Source predicate curie is missing from the YAML config file: NCI:chemical_or_drug_metabolism_is_associated_with_allele
Source predicate curie is missing from the YAML config file: NCI:gene_plays_role_in_process
Source predicate curie is missing from the YAML config file: MEDLINEPLUS:mapping_qualifier_of
Source predicate curie is missing from the YAML config file: HL7V3.0:has_component
Source predicate curie is missing from the YAML config file: NCI:gene_product_has_structural_domain_or_motif
Source predicate curie is missing from the YAML config file: NCI:chromosomal_location_of_wild-type_gene
Source predicate curie is missing from the YAML config file: MTH:RN
Source predicate curie is missing from the YAML config file: MEDLINEPLUS:mapped_to
Source predicate curie is missing from the YAML config file: RXNORM:included_in
Source predicate curie is missing from the YAML config file: NCI:allele_plays_altered_role_in_process
Source predicate curie is missing from the YAML config file: PSY:member_of
Source predicate curie is missing from the YAML config file: NCI:target_anatomy_has_procedure
Source predicate curie is missing from the YAML config file: MSH:mapping_qualifier_of
Source predicate curie is missing from the YAML config file: MED-RT:pharmacokinetics_of
Source predicate curie is missing from the YAML config file: NCI:has_icdc_value
Source predicate curie is missing from the YAML config file: HGNC:prev_symbol_of
Source predicate curie is missing from the YAML config file: OMIM:CHD
Source predicate curie is missing from the YAML config file: RXNORM:SY
Source predicate curie is missing from the YAML config file: VANDF:has_print_name
Source predicate curie is missing from the YAML config file: NCI:is_pcdc_os_permissible_value_for_variable
Source predicate curie is missing from the YAML config file: HGNC:has_expanded_form
Source predicate curie is missing from the YAML config file: MSH:mapped_to
Source predicate curie is missing from the YAML config file: NCI:is_not_molecular_abnormality_of_disease
Source predicate curie is missing from the YAML config file: NCI:biomarker_type_includes_gene_product
Source predicate curie is missing from the YAML config file: MED-RT:SY
Source predicate curie is missing from the YAML config file: NCI:is_value_for_gdc_property
Source predicate curie is missing from the YAML config file: NCI:biological_process_has_associated_location
Source predicate curie is missing from the YAML config file: NCI:genetic_biomarker_related_to
Source predicate curie is missing from the YAML config file: NCI:disease_is_grade
Source predicate curie is missing from the YAML config file: NCI:disease_excludes_metastatic_anatomic_site
Source predicate curie is missing from the YAML config file: NCI:anatomic_structure_has_location
Source predicate curie is missing from the YAML config file: NCI:partially_excised_anatomy_may_have_procedure
Source predicate curie is missing from the YAML config file: MSH:has_mapping_qualifier
Source predicate curie is missing from the YAML config file: OMIM:PAR
Source predicate curie is missing from the YAML config file: OMIM:expanded_form_of
Source predicate curie is missing from the YAML config file: NCI:imaged_anatomy_has_procedure
Source predicate curie is missing from the YAML config file: NCI:procedure_has_completely_excised_anatomy
Source predicate curie is missing from the YAML config file: ATC:has_member
Source predicate curie is missing from the YAML config file: NCI:isa
Source predicate curie is missing from the YAML config file: NCI:organism_has_gene
Source predicate curie is missing from the YAML config file: OMIM:has_entry_term
Source predicate curie is missing from the YAML config file: HL7V3.0:owning_subsection_of
Source predicate curie is missing from the YAML config file: NCI:procedure_may_have_completely_excised_anatomy
Source predicate curie is missing from the YAML config file: NCI:disease_may_have_normal_tissue_origin
Source predicate curie is missing from the YAML config file: NCI:disease_is_stage
Source predicate curie is missing from the YAML config file: NCI:is_physical_location_of_gene
Source predicate curie is missing from the YAML config file: NCI:is_not_normal_cell_origin_of_disease
Source predicate curie is missing from the YAML config file: NCI:human_disease_maps_to_eo_disease
Source predicate curie is missing from the YAML config file: NCI:eo_disease_has_associated_eo_anatomy
Source predicate curie is missing from the YAML config file: NCI:is_seronet_permissible_value_for_variable
Source predicate curie is missing from the YAML config file: NCI:biological_process_involves_gene_product
Source predicate curie is missing from the YAML config file: NCI:gene_product_has_gene_product_variant
Source predicate curie is missing from the YAML config file: HPO:RO
Source predicate curie is missing from the YAML config file: HPO:RN
Source predicate curie is missing from the YAML config file: ICD9CM:CHD
Source predicate curie is missing from the YAML config file: NCI:disease_mapped_to_gene
Source predicate curie is missing from the YAML config file: HPO:RB
Source predicate curie is missing from the YAML config file: NCI:has_pcdc_hl_permissible_value
Source predicate curie is missing from the YAML config file: NCI:disease_excludes_finding
Source predicate curie is missing from the YAML config file: MSH:RB
Source predicate curie is missing from the YAML config file: MED-RT:has_structural_class
Source predicate curie is missing from the YAML config file: HL7V3.0:owning_section_of
Source predicate curie is missing from the YAML config file: NCI:is_mechanism_of_action_of_chemical_or_drug
Source predicate curie is missing from the YAML config file: NCI:cell_type_is_associated_with_eo_disease
Source predicate curie is missing from the YAML config file: MED-RT:mechanism_of_action_of
Source predicate curie is missing from the YAML config file: NCI:eo_disease_has_associated_cell_type
Source predicate curie is missing from the YAML config file: MEDLINEPLUS:has_mapping_qualifier
Source predicate curie is missing from the YAML config file: NCI:pharmaceutical_basic_dose_form_of
Source predicate curie is missing from the YAML config file: NCI:is_dipg_dmg_permissible_value_for_variable
Source predicate curie is missing from the YAML config file: NCI:disease_has_abnormal_cell
Source predicate curie is missing from the YAML config file: VANDF:print_name_of
Source predicate curie is missing from the YAML config file: GO:mth_has_expanded_form
Source predicate curie is missing from the YAML config file: NCI:biological_process_results_from_biological_process
Source predicate curie is missing from the YAML config file: NCI:is_normal_tissue_origin_of_disease
Source predicate curie is missing from the YAML config file: MED-RT:CHD
Source predicate curie is missing from the YAML config file: NCI:disease_is_marked_by_gene
Source predicate curie is missing from the YAML config file: HGNC:alias_of
Source predicate curie is missing from the YAML config file: NCI:related_to_genetic_biomarker
Source predicate curie is missing from the YAML config file: HCPCS:PAR
Source predicate curie is missing from the YAML config file: HL7V3.0:has_supported_concept_property
Source predicate curie is missing from the YAML config file: HL7V3.0:may_be_qualified_by
Source predicate curie is missing from the YAML config file: NCI:gene_involved_in_pathogenesis_of_disease
Source predicate curie is missing from the YAML config file: NCI:process_involves_gene
Source predicate curie is missing from the YAML config file: NCI:gene_product_has_biochemical_function
Source predicate curie is missing from the YAML config file: NCI:is_not_primary_anatomic_site_of_disease
Source predicate curie is missing from the YAML config file: NCI:process_altered_by_allele
Source predicate curie is missing from the YAML config file: PDQ:isa
Source predicate curie is missing from the YAML config file: NCI:pathogenesis_of_disease_involves_gene
Source predicate curie is missing from the YAML config file: NCI:concept_in_subset
Source predicate curie is missing from the YAML config file: NCI:has_salt_form
Source predicate curie is missing from the YAML config file: MED-RT:contraindicated_class_of
Source predicate curie is missing from the YAML config file: NCI:pathway_has_gene_element
Source predicate curie is missing from the YAML config file: MEDLINEPLUS:related_to
Source predicate curie is missing from the YAML config file: NCI:neoplasm_has_special_category
Source predicate curie is missing from the YAML config file: NCI:is_abnormality_of_gene_product
Source predicate curie is missing from the YAML config file: NCI:disease_excludes_normal_tissue_origin
Source predicate curie is missing from the YAML config file: PSY:has_member
Source predicate curie is missing from the YAML config file: NCI:excised_anatomy_has_procedure
Source predicate curie is missing from the YAML config file: NCI:chemical_or_drug_is_metabolized_by_enzyme
Source predicate curie is missing from the YAML config file: NCI:has_pcdc_all_permissible_value
Source predicate curie is missing from the YAML config file: NCI:is_not_finding_of_disease
Source predicate curie is missing from the YAML config file: NCI:chemical_or_drug_affects_cell_type_or_tissue
Source predicate curie is missing from the YAML config file: MEDLINEPLUS:mapped_from
Source predicate curie is missing from the YAML config file: MED-RT:has_parent
Source predicate curie is missing from the YAML config file: NCI:disease_excludes_molecular_abnormality
Source predicate curie is missing from the YAML config file: NCI:gene_product_has_abnormality
Source predicate curie is missing from the YAML config file: MED-RT:has_physiologic_effect
Source predicate curie is missing from the YAML config file: GO:RO
Source predicate curie is missing from the YAML config file: OMIM:has_expanded_form
Source predicate curie is missing from the YAML config file: MED-RT:has_contraindicated_class
Source predicate curie is missing from the YAML config file: NCI:has_pharmaceutical_intended_site
Source predicate curie is missing from the YAML config file: NCI:biomarker_type_includes_gene
Source predicate curie is missing from the YAML config file: NCI:disease_excludes_normal_cell_origin
Source predicate curie is missing from the YAML config file: MTH:exhibited_by
Source predicate curie is missing from the YAML config file: NCI:biological_process_has_result_chemical_or_drug
Source predicate curie is missing from the YAML config file: MED-RT:effect_may_be_inhibited_by
Source predicate curie is missing from the YAML config file: MED-RT:structural_class_of
Source predicate curie is missing from the YAML config file: NCBI:CHD
Source predicate curie is missing from the YAML config file: HCPCS:CHD
Source predicate curie is missing from the YAML config file: MTH:exhibits
Source predicate curie is missing from the YAML config file: NCI:is_metastatic_anatomic_site_of_disease
Source predicate curie is missing from the YAML config file: NCI:endogenous_product_related_to
Source predicate curie is missing from the YAML config file: NCI:is_finding_of_disease
Source predicate curie is missing from the YAML config file: MSH:permuted_term_of
Source predicate curie is missing from the YAML config file: NCI:has_pcdc_os_permissible_value
Source predicate curie is missing from the YAML config file: NCI:disease_excludes_cytogenetic_abnormality
Source predicate curie is missing from the YAML config file: NCI:is_abnormality_of_gene
Source predicate curie is missing from the YAML config file: NCI:disease_excludes_abnormal_cell
Source predicate curie is missing from the YAML config file: NCI:is_marked_by_gene_product
Source predicate curie is missing from the YAML config file: NCBI:expanded_form_of
Source predicate curie is missing from the YAML config file: NCI:disease_has_primary_anatomic_site
Source predicate curie is missing from the YAML config file: HL7V3.0:CHD
Source predicate curie is missing from the YAML config file: NCI:may_be_associated_disease_of_disease
Source predicate curie is missing from the YAML config file: PDQ:has_expanded_form
Source predicate curie is missing from the YAML config file: MTH:has_form
Source predicate curie is missing from the YAML config file: NCI:has_ctcae_5_parent
Source predicate curie is missing from the YAML config file: NCI:disease_may_have_abnormal_cell
Source predicate curie is missing from the YAML config file: MED-RT:metabolic_site_of
Source predicate curie is missing from the YAML config file: NCI:pharmaceutical_transformation_of
Source predicate curie is missing from the YAML config file: MED-RT:has_pharmacokinetics
Source predicate curie is missing from the YAML config file: MED-RT:may_be_treated_by
Source predicate curie is missing from the YAML config file: NCBI:has_expanded_form
Source predicate curie is missing from the YAML config file: NCI:gene_product_expressed_in_tissue
Source predicate curie is missing from the YAML config file: NCI:gene_is_element_in_pathway
Source predicate curie is missing from the YAML config file: NCI:is_pcdc_ews_permissible_value_for_variable
Source predicate curie is missing from the YAML config file: ICD10PCS:expanded_form_of
Source predicate curie is missing from the YAML config file: NCI:gene_product_is_physical_part_of
Source predicate curie is missing from the YAML config file: OMIM:alias_of
Source predicate curie is missing from the YAML config file: NCI:chemical_or_drug_affects_gene_product
Source predicate curie is missing from the YAML config file: NCI:is_component_of_chemotherapy_regimen
Source predicate curie is missing from the YAML config file: NCI:has_inc_parent
Source predicate curie is missing from the YAML config file: NCI:may_be_finding_of_disease
Source predicate curie is missing from the YAML config file: NCI:allele_has_activity