eclarke / komplexity

A method of assessing sequence complexity based on kmer frequencies
28 stars 9 forks source link

Question: Filtering for paired end #5

Open weedcentipede opened 4 years ago

weedcentipede commented 4 years ago

Hello, I was wondering if there was some recommended pipeline/flag inside the program for filtering low complexity in paired end datasets,

Thanks in advance, Luis Alfonso

kislyuk commented 4 years ago

Hi @eclarke, thanks for contributing this package. I'm seconding @OnlyHigh's question - I appreciate the advantages that komplexity provides, including native handling of fastq files, but we all routinely deal with paired fastq.gz files. In paired-end context, both reads must be either kept or dropped. Low-complexity filtering can be done conservatively when both reads fail the threshold, or aggressively when only one does. Do you have any suggestions for this or plans to add native handling of these files to komplexity?