Closed akeeeshi closed 5 years ago
This behavior is expected and desired. As the error says, that accession does not point to a unique sequence.
Ensembl accessions without version numbers do not refer to a single sequence. You either need to specify the version number or the ensembl release in which the sequence occurred.
In [8]: sorted( (r["namespace"],r["alias"],r["seq_id"]) for r in sr.aliases.find_aliases(alias="ENST00000390559") )
Out[8]:
[('Ensembl-75', 'ENST00000390559', 'VtZEHJ1ekdQJeXuNmZ2eQ9vgsDFPSuUu'),
('Ensembl-76', 'ENST00000390559', 'MGA27-VbNXqbP2StRGixxhtcyhfynGYJ'),
('Ensembl-77', 'ENST00000390559', 'MGA27-VbNXqbP2StRGixxhtcyhfynGYJ'),
('Ensembl-78', 'ENST00000390559', 'MGA27-VbNXqbP2StRGixxhtcyhfynGYJ'),
('Ensembl-79', 'ENST00000390559', 'MGA27-VbNXqbP2StRGixxhtcyhfynGYJ'),
('Ensembl-80', 'ENST00000390559', 'MGA27-VbNXqbP2StRGixxhtcyhfynGYJ'),
('Ensembl-81', 'ENST00000390559', 'MGA27-VbNXqbP2StRGixxhtcyhfynGYJ'),
('Ensembl-82', 'ENST00000390559', 'MGA27-VbNXqbP2StRGixxhtcyhfynGYJ')]
Thank you for the help. This clears things up.
Hello,
I was playing around with extracting sequences from SeqRepo for ensembl transcripts and noticed an issue that is thrown when there are multiple sequences returned throwing a KeyError.
Successful example with BRAF ensembl transcript:
In [18]: sr['ENST00000288602'] INFO:biocommons.seqrepo.fastadir.fastadir:Opening for reading: /usr/local/share/seqrepo/2017-07-04/sequences/2016/0824/050656/1472015216.5390692.fa.bgz Out[18]: 'CGCCTCCCTTCCCCCTCCCCGCCCGACAGCGGCCGCTCGGGCCCCGGCTCTCGGTTATAAGATGGCGGCGCTGAGCGGTGGCGGTGGTGGCGGCGCGGAGCCGGGCCAGGCTCTGTTCAACGGGGACATGGAGCC .... (shortened for readibility)'
Unsuccessful example with IGHM transcript
In [19]: sr['ENST00000390559']
It appears to not actually be a KeyError but rather that the same alias (e.g. ENST00000390559) exists for namespaces (i.e. ensemble versions). This is important because when trying to annotate using the HGVS package, the namespace is not specified. Is there a specific reason for this behavior?