Open ahwagner opened 3 years ago
This issue touches on several problems:
The only official way to get exon coordinates out of Ensembl is to use the perl API. Unfortunately, when I last tried in May 2016, I discovered that Ensembl and bioperl didn't work on a modern distribution.
In ensembl-dev:
Hi Reece,
Yes, you are right. It's the perl version that's not compatible with BioPerl and even the Ensembl API. The latest perl version that's been safely tested with Ensembl API is 5.14.
Thanks, Harpreet
My recollection is that perl 5.14 was significantly out of date at the time, and that installing manually had knock-on effects with dependent modules. I gave up.
So, to solve this issue, we need a reliable way to get versioned transcripts out of Ensembl.
Would we be able to use the .gff files for this purpose, e.g. http://ftp.ensembl.org/pub/release-104/gff3/homo_sapiens/Homo_sapiens.GRCh38.104.chr_patch_hapl_scaff.gff3.gz? It appears that they have gene/transcript/exon IDs with versions, and earlier releases are also maintained, e.g. release 101: http://ftp.ensembl.org/pub/release-101/gff3/homo_sapiens/Homo_sapiens.GRCh38.101.chr_patch_hapl_scaff.gff3.gz
Yes, those should be usable in principle, but no work has actually gone into that yet.
Hi, I've made cdot - data provider that includes Ensembl transcripts - see HGVS issue
The GTF parsing code etc is all under MIT if you want to re-use this in UTA, an alternative would be to use the JSON and convert that to SQL (or the data provider)
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
When trying to refer to ensembl transcripts we cannot find by version in the 20210129 data release.