tama_flnc_polya_cleanup.py fails for "A"-only sequences

Hi The script tama_flnc_polya_cleanup.py fails in my hand for some IsoSeq derived sequences. Trying to find the source I had sequences which contained only PolyA. E.g.

>transcript/113571 full_length_coverage=9;length=189
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAA

If such a sequence is part of the FASTA file given to the script it fails with the message:

Traceback (most recent call last):
  File "/tmp/tama/tama_go/sequence_cleanup/tama_flnc_polya_cleanup.py", line 153, in <module>
    this_block_obj = a_block_dict[this_block_index]
KeyError: 0

If I do modify the sequence and add e.g. a single C at the position 1 everything is fine. This is obviously pretty rare but made the script fail for some of my IsoSeq data. A more biology related issue is maybe the fact that if we add instead e.g. another nucleotide at any different spot in the sequence it works and takes everything 5' as sequence and everything 3' as tail. I guess I would argue that in this case it is much more likely that we have a wrong base-call instead.

GenomeRIK / tama

tama_flnc_polya_cleanup.py fails for "A"-only sequences #52