I diagnosed a grounding issue which results in de-prioritized groundings for some human genes/proteins if a synonym is only available from HGNC, not UniProt. One example is ALK6, which appears as a synonym for BMPR1B from HGNC but not UniProt, while being a synonym for some non-human proteins e.g. https://www.uniprot.org/uniprot/P36898 in UniProt. Since matches to UniProt are prioritized relative to HGNC, what happens is that only the non-human matches for ALK6 are surfaced, and the match to the human protein is lost.
To solve, this, I am working on merging the HGNC-derived human gene/protein synonyms into uniprot-proteins.tsv.gz such that they will be pooled into any match derived from UniProt. This means that the HGNC-specific resource files and code can be removed here and in Reach. I will follow up with corresponding PRs.
I diagnosed a grounding issue which results in de-prioritized groundings for some human genes/proteins if a synonym is only available from HGNC, not UniProt. One example is
ALK6
, which appears as a synonym for BMPR1B from HGNC but not UniProt, while being a synonym for some non-human proteins e.g. https://www.uniprot.org/uniprot/P36898 in UniProt. Since matches to UniProt are prioritized relative to HGNC, what happens is that only the non-human matches forALK6
are surfaced, and the match to the human protein is lost.To solve, this, I am working on merging the HGNC-derived human gene/protein synonyms into
uniprot-proteins.tsv.gz
such that they will be pooled into any match derived from UniProt. This means that the HGNC-specific resource files and code can be removed here and in Reach. I will follow up with corresponding PRs.