biocommons / uta

Universal Transcript Archive: comprehensive genome-transcript alignments; multiple transcript sources, versions, and alignment methods; available as a docker image
Apache License 2.0
62 stars 26 forks source link

ENSEMBL transcripts not versioned #233

Open ahwagner opened 3 years ago

ahwagner commented 3 years ago

When trying to refer to ensembl transcripts we cannot find by version in the 20210129 data release.

reece commented 3 years ago

This issue touches on several problems:

The only official way to get exon coordinates out of Ensembl is to use the perl API. Unfortunately, when I last tried in May 2016, I discovered that Ensembl and bioperl didn't work on a modern distribution.

In ensembl-dev:

Hi Reece,

Yes, you are right. It's the perl version that's not compatible with BioPerl and even the Ensembl API. The latest perl version that's been safely tested with Ensembl API is 5.14.

Thanks, Harpreet

My recollection is that perl 5.14 was significantly out of date at the time, and that installing manually had knock-on effects with dependent modules. I gave up.

So, to solve this issue, we need a reliable way to get versioned transcripts out of Ensembl.

ahwagner commented 3 years ago

Would we be able to use the .gff files for this purpose, e.g. http://ftp.ensembl.org/pub/release-104/gff3/homo_sapiens/Homo_sapiens.GRCh38.104.chr_patch_hapl_scaff.gff3.gz? It appears that they have gene/transcript/exon IDs with versions, and earlier releases are also maintained, e.g. release 101: http://ftp.ensembl.org/pub/release-101/gff3/homo_sapiens/Homo_sapiens.GRCh38.101.chr_patch_hapl_scaff.gff3.gz

reece commented 3 years ago

Yes, those should be usable in principle, but no work has actually gone into that yet.

davmlaw commented 2 years ago

Hi, I've made cdot - data provider that includes Ensembl transcripts - see HGVS issue

The GTF parsing code etc is all under MIT if you want to re-use this in UTA, an alternative would be to use the JSON and convert that to SQL (or the data provider)

github-actions[bot] commented 9 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 6 months ago

This issue was closed because it has been stalled for 7 days with no activity.