Open cbizon opened 1 month ago
@cbizon Do you know where we could get CL/UMLS or CL/NCIT mappings from? We have a CL
concords file in intermediate/anatomy/concord
, but it's been empty since August 2023 at least, and I don't see a single CL:
mapping in any of the other concord files (https://stars.renci.org/var/babel_outputs/2024mar24/intermediate/anatomy/concords/).
I think CL mappings might previously have come from Ubergraph, but CL is being specifically excluded from that, and in any case I can't find any direct mappings between CL:0000115 and NCIT:C12865. Some of those mappings might have come from EFO, in which case we might need to find some new source for these mappings.
@gaurav is this kind of issue better put in the babel repo?
That would be ideal for cliquing issues, but I don't really mind if they're in this repo instead (or in the Translator Feedback repo for that matter).
I took a look and I think that there are 2 possibliities:
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wdtn: <http://www.wikidata.org/prop/direct-normalized/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?wd wdtn:P7963 ?cl .
?wd wdt:P2892 ?umls .
}
It returns about 646 mappings, and I think they look pretty reasonable. Another option is that if we could use one of these approaches to generate some CL/UMLS mappings we could try to pass them to CL to be included directly in the ontology. That would make babel a little simpler anyway.
but for the time being I think we should implement one of these two approaches as the problem seems pretty endemic.
I could probably find some time to add one or the other of these mappings if you want me to
I could probably find some time to add one or the other of these mappings if you want me to
That would be great -- thanks so much!
Do you have an opinion on which of these approaches is better? I lean towards the wikidata version...
I definitely like the Wikidata idea -- faster updates in exchange for potential incorrect mappings sounds like a good trade-off to me! Perhaps we should try that, see where we're at, and then add CL/FMA and UMLS/FMA mappings if needed? Unless there's any reason we need more FMA terms/mappings in Babel, but I can't think of a reason right now.
The fact that FMA is unsupported makes me nervous about using it, so if the wikidata works, I think that's better. it might also be interesting to see if prov can be pulled from wikidata.
Also this comment from the source!
So we'll go with the wikidata approach...
We seem to have one clique for
UMLS:C0225336
(Endothelial cells) and another forCL:0000115
(Endothelial Cell). I suspect that this UMLS/CL split is happening for lots of cell types.@gaurav is this kind of issue better put in the babel repo?