SACGF / cdot

Transcript versions for HGVS libraries
MIT License
29 stars 5 forks source link

FastaSeqFetcher - handle deletions #45

Closed davmlaw closed 1 year ago

davmlaw commented 1 year ago

NM_153498.3 has an insertion compared to the reference

It has transcript length of 8093, while get_seq returns a string of 8092, it should insert it as a N

Should also check that we handle deletions as well (I think so?)

davmlaw commented 1 year ago

Fixed, had gotten confused on which way HGVS cigars projected.

I now also throw an exception that would have caught the previous error with:

ValueError: Error creating NM_153498.3 sequence from genome fasta (NC_000010.10): expected_transcript_length=8093 != len(transcript_sequence)=8092
davmlaw commented 1 year ago

The actual bug was due

                start += length

Only being done on matches (not insertion/deletion) so the start coord was thrown off. I undid my previous fix