drivenbyentropy / aptasuite

A full-featured bioinformatics software collection for the comprehensive analysis of aptamers in HT-SELEX experiments.
https://drivenbyentropy.github.io/
GNU General Public License v3.0
24 stars 11 forks source link

Invalid alphabet #41

Closed barkait closed 6 years ago

barkait commented 6 years ago

Hey,

When i am parsing my data (which is kind of artificial) there is some entries that are classified as "invalid alphabet". Meanwhile i can't share my FASTQ files, but you might elaborate what is exactly "invalid alphabet"?

Best,

drivenbyentropy commented 6 years ago

Hi,

Invalid Alphabet means that the read was discarded because it contained a letter which did not correspond to A C G or T. In FastQ files this typically corresponds to occurrences of N.

If your data is artificial and you know for sure no invalid nucleotides occur, there might be a bug in the parser which is assigning the read to this case by mistake.

If you can provide me with a minimal example where you encounter this issue, I will be more than happy to look into it.

Thank you for reporting this!

PJpb commented 6 years ago

I can’t log in right now to github but i’ve had the same issue. For me it was due to the files not being UNIX and UTF-8 coded. Might be the same thing?

W dniu pt., 30.03.2018 o 09:16 drivenbyentropy notifications@github.com napisał(a):

Hi,

Invalid Alphabet means that the read was discarded because it contained a letter which did not correspond to A C G or T. In FastQ files this typically corresponds to occurrences of N.

If your data is artificial and you know for sure no invalid nucleotides occur, there might be a bug in the parser which is assigning the read to this case by mistake.

If you can provide me with a minimal example where you encounter this issue, I will be more than happy to look into it.

Thank you for reporting this!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/drivenbyentropy/aptasuite/issues/41#issuecomment-377463946, or mute the thread https://github.com/notifications/unsubscribe-auth/AfUlLXQPmg2LIXy0NBdnoG4biGuVpf90ks5tjdvqgaJpZM4TBXYG .

-- Przemysław Jurek Starszy Specjalista ds. Badań i Rozwoju

tel. +48 796 07 97 24 www.PureBiologics.com http://www.purebiologics.com/

Pure Biologics S.A., ul. Duńska 11, 54-427 Wrocław https://maps.google.com/?q=ul.+Du%C5%84ska+11,+54-427+Wroc%C5%82aw&entry=gmail&source=g Pure Biologics Oddział w Berlinie, Rudower Chaussee 29, 12489 Berlin, Niemcy https://maps.google.com/?q=Berlinie,+Rudower+Chaussee+29,+12489+Berlin,+Niemcy&entry=gmail&source=g

REGON: 021305772 | NIP 894-300-3192 | KRS: 0000712811

barkait commented 6 years ago

as you said, i found some N's in my data, so that must be the reason. thanks!

drivenbyentropy commented 6 years ago

I have add a description regarding the meaning of the individual parsing statistics to the Wiki.

Thanks again!