---------- Forwarded message ---------
Date: Wed, Jul 31, 2024 at 10:44 PM
Subject: Antisense RNAs
[snip]
I did, however, come across something that seems like a bug, and I wanted to give you a heads-up. I was recently using your API to get all known aliases for approved gene symbols (HGNC) and I got puzzled by some results I was getting. It seems that antisense RNAs get higher scores and thus land at the top of the hit list instead of the sense genes they correspond to (see enclosed response for CTNNA2 with CTNNA2-AS1 being the top hit). In my search I got this kind of result for almost 10% of queries.
I have increased the weight for symbols in our search query. I have double checked with the examples you have given. Let me know if you have any issues.
As reported to the help email:
I confirmed the non-ideal sorting behavior here https://mygene.info/v3/query?q=CTNNA2&species=human (the third result for
CTNNA2
should come first):By searching for "antisense" as a keyword, we can find many other examples (and likely this applies to all ~12k results):
I will counsel the reporter that a fielded search (e.g., https://mygene.info/v3/query?q=symbol:CTNNA2&fields=alias,symbol,taxid) would be useful here, but I think there is definitely an opportunity here to improve our default sorting behavior.