Describe the bug
When I index the transcriptome duplicates with non-atgc characters are not identified as duplicates which leads to issues during quantification.
To Reproduce
using salmon v1.10.0
salmon index -p 12 -t testtranscriptome.fa -i nodecoy_salmon_index --keepDuplicates
Only GeneB is in the resulting duplicate_clusters.tsv
This is the transcriptome (both genes are duplicates of one another one GeneA contains non-atgc characters)
Describe the bug When I index the transcriptome duplicates with non-atgc characters are not identified as duplicates which leads to issues during quantification.
To Reproduce
using salmon v1.10.0 salmon index -p 12 -t testtranscriptome.fa -i nodecoy_salmon_index --keepDuplicates
Only GeneB is in the resulting duplicate_clusters.tsv This is the transcriptome (both genes are duplicates of one another one GeneA contains non-atgc characters)