MaayanLab / archs4

ARCHS4 RNA-seq processing scripts and web server pages.
Other
51 stars 10 forks source link

Duplicate Gene Names Implications? #41

Open amnahsiddiqa opened 3 months ago

amnahsiddiqa commented 3 months ago

hi @lachmann12. I really appreciate this resource , it is truly great help. But apparently I noticed this too and as you may see in the screenshot attached that values of each entry is not identical, should I imply int was at transcript level rather than gene ? Screenshot 2024-06-07 at 1 13 02 PM

lachmann12 commented 3 months ago

Thank you for your feedback!. I was not aware that there is cases where the counts differ. It comes from some issues in the Ensembl annotation where multiple Ensembl gene ids map to the same gene symbol. When investigating the Ensembl genes that map to the same symbol we found that they usually have identical transcript sequences, meaning they are indistinguishable from each other based on reads. As a result the counts for duplicated genes are always the same. It is thus safe to just keep one gene entry and dismiss the others. In future updates, we will try to resolve this issue. I will leave the issue open until we resolved it. In the case where they are different I would suggest using the entry with the most counts.