Open cbizon opened 8 months ago
I took a look at the first one here. https://id.nlm.nih.gov/mesh/D011972.html (Insulin receptor). According to the mesh code, this MESH id should not be included as a Chemical. As that URL shows, the Tree values are D12.776 and D08, both of which are excluded in the chemical.py mesh filter. Not sure at this point whether the MESH is somehow getting into the chemical id list or if we're looking at an old result somehow or what.
OK, what I think is going on is that the MESH terms are correctly being put under Protein, but the UMLS are still getting called ChemicalEntities. Then the MESH terms are getting dragged along via a mapping. And I think that the reason that the UMLS are not working corrrectly is that our list of UMLS Tree id's doesn't use excludes. So Insulin Receptor has three listings in MRSTY:
C0034818|T116|A1.4.1.2.1.7|Amino Acid, Peptide, or Protein|AT17641609|256|
C0034818|T126|A1.4.1.1.3.3|Enzyme|AT17738045|256|
C0034818|T192|A1.4.1.1.3.6|Receptor|AT17615610|256|
So even though we don't let in Receptor, we do let in Enzyme. We need to instead say "if you are a receptor, you don't go here, no matter what your other listings say"
It also looks like 1.4.1.2.1.7 is being grabbed by protein. So basically we need to
See https://github.com/NCATSTranslator/Feedback/issues/613 https://github.com/NCATSTranslator/Feedback/issues/614 https://github.com/NCATSTranslator/Feedback/issues/615.
These are all proteins, which under biolink are biological entities, but we're calling them chemicals. I think that this is probably just never cleaned up from when protein went over into the biological entity branch.