OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
MIT License
1.82k stars 334 forks source link

low complexity trimming or masking for nanopore reads #431

Open ShannonDaddy opened 1 year ago

ShannonDaddy commented 1 year ago

Hi, is there any option for fastp to trim or masking low complexity region in nanopore reads? As I always see some low complexity reads as follows:

@147b5220-2abe-4d1f-9507-b7d278e33efa ATGTGCTTCAGTTCAGTTACGTGTGCTGGTGCTGTCACTACTCAACAGGTGGCATGAATTAACTTACTTGCCTGTCGCTCTATCTTCGGCGTCTTGGGTGTTTAACCTACACTACACACACACCACACCACACACACACACACATACACACACCCCACAGCACACGCCCCCCACACACACAGACACCACACACACACCGCACACCACACTACACACACACCACACACCACACCACACACTACACACACACCACACACCACACCACACACACACCATACACACCACACCACACACCACACCACACACACCACACACACACAAACACACACACACACGCACACCGCCACCTGCACACACTACACACACACCACACACCACACCACACACCCACACACACACCACACACACCACACACACCACACCATACACACACACCACACACACACCACACACTACACACACCACACACACACCACACACACACCACACACAGCACACACCACATCCCACACACACACACCACACACCACACCACACACTACACACACACCACACATCGCACACCACACCACACACACCACACACCACACACACACTGCACACCACACACACCGCACACCACACACTACACACACACCACACACCACACACACCACACACACACCATACACCACACACGCACCACACACACACACCACACACCACACGCACACCACACACACCACACACACACCCATACCACACCACACACACCACACACACCACACACCACACACAGGTTAAACACCCAAACGGACATACCGCAATATCAGCACCAACAGAAGGTTAATTCATGCCACCCATATTTGGTCTTTACGTTGTTATGTGCTTCGTTCAGTTACGTATTGCTGGTGCTGCAGAGCTTTGACTAAGGAGCATGTTAACCTTTCTGTTGGTGCTGATATTGCGGCGTCTGCTTGGGTGTTTAACCTCATGAAAACGCAAATATTTTAAAAATGTAGCTTTATGCAAAAGCAAGCTGAAAGGTTTCTTGTTGCATTGTTGTACGTTGAAGCTCAGTCACTTTGCTGACATTGAGTTTCTTTTTCTCCCAGTCACCCTTCTCCACCAATGCTACTATTTATGCGAAGTGTCGGAAATTAACTTCTCATGTGACCACCCAATTCGGTTCCAGTCGCTTGGAAATGTAATCTATTC

@d4fc11d4-ef02-42b2-a908-2adc747e10b5 AGCGCCACTTGAGAGCCTGGACGATAAGAGTGAGACTCCATCTCAACAAAAATAAAAATAAATATATAACTTAGGTTATATTTTTGCTCATTAAAAAAATTCTACATAGACCTACTCCAGATGAAACCGGAGATAATATATATATTATACAAAATATTTCATACTATATCAAAAGATACTTGGCAGAAAAATTACACTGTCTTAAGAATAACAAATAAATAATTCCAGATGTCTATTCACAGATCATGGGTGGACTTATATGTAAGTACTAAACTACGTGTATAAACGTATTCATTCTCACAAAGAAAGACACATGCTGTTAATGCATATTGTTAAGTGAAAAAATAAGGTTTCAAAAAAGGATAGAAAGTCTTATCACATTTTTATTGTGAGTATATAATTGTGAAAAAGATTTAATCATACCAAAACAGAGTTGAACTAAGTGGAATGTATTAGTCTGTTCTCACACTGCTAATGAAGACATACCCAAGACTGGGTAATTTATAAAAGAAAAGAGGTTTAATGGACTCACAGTTCCACATGGCTGGGGAGGCCTCATAATCATGATGGAAGGTGAAGGGGAGTAAAGGCACATATTACATGGTGGCAGGCAAGAGAGCTTGTGCAGGAGTTAAACACCCAAGCAGACGCCGCAATATACAACCAACAGAAAGTTAATTCATACCACCTGTTAAATGACAGCACCAACTTGTGTACATGTACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTGATTCGTTCTTTTTTTTTGATTTTGTTCCCACTTTTTTTTTTTTGGTTTCCTTTTTACGTTGGTTTTGCCTTTTTTTTTTTTTTTGGTTGATTTGGTTACGTTCTCTTTGTGTTCCCTTTTGATTTTGTTTTTGCAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTTTTTTGTCTTTTTTTTTTGGATTCTTTTTTTGGATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTGTTTTTTTTTTTTTTTGTTGATTTTGGATTT

I think fastp is by far the most powerful tool for processing sequencing reads, it would be so great if fastp has the low complexity trimming or masking option. Thanks a lot!

semenko commented 1 year ago

This feature is available as --low_complexity_filter — see: https://github.com/OpenGene/fastp#low-complexity-filter

ShannonDaddy commented 1 year ago

This feature is available as --low_complexity_filter — see: https://github.com/OpenGene/fastp#low-complexity-filter

Yes,this option will filter out the whole read, but what I need is just to trim the low complexity part of the read.

lucyintheskyzzz commented 3 weeks ago

@ShannonDaddy Hi did you end up using the low complexity filter option for your sample? I am using nanopore to sequence viruses that tend to have repeat regions, do you recommend using the low complexity option?