biocommons / biocommons.seqrepo

non-redundant, compressed, journalled, file-based storage for biological sequences
Apache License 2.0
39 stars 35 forks source link

Suggestion: Build Ensembl transcript sequences by concatenating genomic exons #103

Open davmlaw opened 1 year ago

davmlaw commented 1 year ago

I have been told by Ensembl that:

Ensembl transcripts completely match the reference genome they are annotated against, so HGVS transcript level variant descriptions will be mapped to the reference genome and annotated accurately. Where the underlying reference sequence changed in the move from GRCh37 to GRCh38, we incremented the transcript version.

This means you could effectively get all Ensembl transcripts by making them from exon coordinates.

I have already implemented this - see Example code feel free to take

If you do your own liftover (ie produce Ensembl transcript alignments for a build where the sequence doesn't match) then this guarantee may be broken, perhaps only take official alignments?

Example use case: https://github.com/biocommons/hgvs/issues/621

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 10 months ago

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.