Open holtgrewe opened 10 months ago
We could just build and release everything
If you want to work out the appropriate GTF/GFF to get for a VEP release
Start with this VEP cache table
Eg for VEP 110
Annotation consortium | Genome Build | Location | Version |
---|---|---|---|
Ensembl | GRCh37 | Last 37 Ensembl release - stuck version? | 87 |
Ensembl | GRCh38 | Matches VEP release | 110 |
RefSeq | GRCh37 | Listed under row "RefSeq" | 2020-10-26 (GCF_000001405.25_GRCh37.p13_genomic.gff) |
RefSeq | GRCh38 | Listed under row "RefSeq" | 110(GCF_000001405.40_GRCh38.p14_genomic.gff) |
Note: The dates don't match what's on RefSeq FTP site, may be a few days later
I think that it would be feasible to run the VEP-compatible releases in CI.
The main limitation will be memory usage. A possible workaround may be using something like orjson for getting a potentially more memory efficient representation of the transcripts in memory.