HumanCellAtlas / secondary-analysis

Secondary Analysis Service of the Human Cell Atlas Data Coordination Platform
https://pipelines.data.humancellatlas.org/ui/
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

GENCODE Gene IDs in Expression Matrices do not contain version information #751

Open kbergin opened 5 years ago

kbergin commented 5 years ago

From Mark Diekhan’s with HCA at UCSC:

{quote}I was looking at the expression matrix CVS files for beta2 and noticed that the GENCODE gene ids don't include the version number.

So they have lines like: ENSG00000252830,5S_rRNA,rRNA,chr1,143439605,143439714,True

It would be far better if ENSG00000252830 included the version number: ENSG00000252830.2.

While technically you can take the gencode release and look up the version number, this is both extra work and the gencode release often gets lost when results make it out in the wild. I have seen this a lot with the UCSC browser dropping the refseq version from the accession.

So I am trying to find out where the version number gets dropped and get a policy establish to not do keep the version. Any idea if this is happening at the analysis level?

My other job is GENCODE, so if you have information, I am the go-to guy.

{quote}

┆Issue is synchronized with this Jira Bug