denalitherapeutics / archs4

An R interface to query and extract data from the ARCHS4 data
10 stars 1 forks source link

Parsing mouse ensembl gtf into gene-level data gives duplicate symbols #8

Open lianos opened 6 years ago

lianos commented 6 years ago

My parsing of the mouse ensembl gtf into gene-level data gives feature file assigns "Olfr912" and "Srp54a" to more than one ensembl identifier.

For instance, Olfr912 gets assigned to ENSMUSG00000111448 (correctly) but also ENSMUSG00000060114 (incorrectly). The latter should be Olfr910. The archs4 "gene_name" gets this right, so this is now being used for the "symbol" column in commit a34e466dc51a803551798c3544bd5a4742c740d8