More conservative autodetection of sequence type

BioJulia / FASTX.jl

Parse and process FASTA and FASTQ formatted files of biological sequences.

https://biojulia.dev

MIT License

61 stars 20 forks source link

More conservative autodetection of sequence type #59

Closed jakobnissen closed 2 years ago

jakobnissen commented 2 years ago

DNA/RNA sequences with just a few ambiguous characters will be detected as amino acid sequences. Perhaps a better scheme is to check if all characters are translatable in first DNA, then RNA, then AA.

jakobnissen commented 2 years ago

Actually, based on the Slack conversation, maybe it's better just to remove autodetection alltogether. Since it apparently causes issues with users getting unexpected sequences, it's bad for type stability, and it's one of those API decisions that makes it hard to write robust scripts/software that doesn't fail when it encounters an edge case.

jakobnissen commented 2 years ago

Implemented on release-2 branch, closing.