OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
MIT License
1.88k stars 332 forks source link

Feature request: FASTQ read splitting/tiling #71

Open rmzelle opened 6 years ago

rmzelle commented 6 years ago

I'm looking for a tool to split reads in a fastq.gz file into shorter fragments, e.g. to split up Oxford Nanopore reads into non-overlapping 500 bp chunks. I've tried http://ngsutils.org/modules/fastqutils/tile/ but it doesn't seem to work well with large read files (I get an "Too many open files" error as it writes too many temp files), and it doesn't look like it can output to STDOUT.

Is this a feature you'd be willing to add to fastp?

sfchen commented 6 years ago

Is this a common feature? I mean, do you think many other people will use it if I implement it?

rmzelle commented 6 years ago

With the exception of NGSUtils, I couldn't find any other tools or scripts to split reads within a FASTQ file into smaller reads (with or without overlap). So it's probably not very commonly needed, but this might change as long-read sequencing becomes more popular.

In my case, I'm trying to accurate determine gene copy number in a genome via relative Nanopore read coverage, but my target gene has multiple repeats on a scaffold that is about the same size as the median read length of my Nanopore reads. I expect to get more accurate results if I can chop my reads up in shorter fragments before aligning them to my reference genome, but so far I haven't found an existing tool to do that.

jlboat commented 5 years ago

@rmzelle This request doesn't sound like something that should be a feature of a trimming tool. It seems to be a specialized request that may require programming something from the ground up.

rmzelle commented 5 years ago

It seems to be a specialized request that may require programming something from the ground up.

Sure. Feel free to close this ticket if this is considered out of scope and/or too esoteric.