Closed ebioman closed 3 years ago
Hello,
Apologies foe the delayed response. I don't seem to be getting the email notifications for issues.
Can you tell me how you processed this data to get to that fasta file? There really shouldn't be any sequences at that point which are pure A's.
I can think of a few ways to deal with this but I just want to understand the problem better.
Thank you, Richard
Hi The script
tama_flnc_polya_cleanup.py
fails in my hand for some IsoSeq derived sequences. Trying to find the source I had sequences which contained only PolyA. E.g.If such a sequence is part of the FASTA file given to the script it fails with the message:
If I do modify the sequence and add e.g. a single
C
at the position 1 everything is fine. This is obviously pretty rare but made the script fail for some of my IsoSeq data. A more biology related issue is maybe the fact that if we add instead e.g. another nucleotide at any different spot in the sequence it works and takes everything 5' as sequence and everything 3' as tail. I guess I would argue that in this case it is much more likely that we have a wrong base-call instead.