TranslatorSRI / NodeNormalization

Service that produces Translator compliant nodes given a curie
MIT License
9 stars 6 forks source link

endothelial cells (probably other cells too) #285

Open cbizon opened 1 month ago

cbizon commented 1 month ago

We seem to have one clique for UMLS:C0225336 (Endothelial cells) and another for CL:0000115 (Endothelial Cell). I suspect that this UMLS/CL split is happening for lots of cell types.

@gaurav is this kind of issue better put in the babel repo?

gaurav commented 1 month ago

@cbizon Do you know where we could get CL/UMLS or CL/NCIT mappings from? We have a CL concords file in intermediate/anatomy/concord, but it's been empty since August 2023 at least, and I don't see a single CL: mapping in any of the other concord files (https://stars.renci.org/var/babel_outputs/2024mar24/intermediate/anatomy/concords/).

I think CL mappings might previously have come from Ubergraph, but CL is being specifically excluded from that, and in any case I can't find any direct mappings between CL:0000115 and NCIT:C12865. Some of those mappings might have come from EFO, in which case we might need to find some new source for these mappings.

@gaurav is this kind of issue better put in the babel repo?

That would be ideal for cliquing issues, but I don't really mind if they're in this repo instead (or in the Translator Feedback repo for that matter).

cbizon commented 1 month ago

I took a look and I think that there are 2 possibliities:

  1. both CL and UMLS have links to FMA. So we could make CL/FMA and UMLS/FMA mappings and let glommer sort it
  2. Wikidata has both CL and UMLS. Jim provided me this sparql (which runs in FRINK!)
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wdtn: <http://www.wikidata.org/prop/direct-normalized/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
  ?wd wdtn:P7963 ?cl .
  ?wd wdt:P2892 ?umls .
}

It returns about 646 mappings, and I think they look pretty reasonable. Another option is that if we could use one of these approaches to generate some CL/UMLS mappings we could try to pass them to CL to be included directly in the ontology. That would make babel a little simpler anyway.

but for the time being I think we should implement one of these two approaches as the problem seems pretty endemic.

cbizon commented 1 month ago

I could probably find some time to add one or the other of these mappings if you want me to

gaurav commented 1 month ago

I could probably find some time to add one or the other of these mappings if you want me to

That would be great -- thanks so much!

cbizon commented 1 month ago

Do you have an opinion on which of these approaches is better? I lean towards the wikidata version...

gaurav commented 1 month ago

I definitely like the Wikidata idea -- faster updates in exchange for potential incorrect mappings sounds like a good trade-off to me! Perhaps we should try that, see where we're at, and then add CL/FMA and UMLS/FMA mappings if needed? Unless there's any reason we need more FMA terms/mappings in Babel, but I can't think of a reason right now.

cbizon commented 1 month ago

The fact that FMA is unsupported makes me nervous about using it, so if the wikidata works, I think that's better. it might also be interesting to see if prov can be pulled from wikidata.

cbizon commented 1 month ago

Also this comment from the source!

FMA is a specific problem where in CL they use FMA xref to mean 'part of'

So we'll go with the wikidata approach...