Low number of raw input reads detected

jadedavis5 commented 1 year ago

Hi,

I am trying to run through reads with a library protocol that is not included in the tool. When I input the raw reads into sRNAbench and use custom protocols (adapter sequence: AAAAAAAA, minimum adapter length: 8, remove 5' barcode: 4). I get the following output (which I would expect for my samples):

However, when I pre-trim my samples with the same parameters (using cutadapt -u 4 -a AAAAAAAA) and input them as already trimmed it comes out with the following output:

There are >18 million reads in the pre-trimmed input file (~2.2 Gb file), however sRNAbench only detects 690. Just wondering if there is a reason why sRNAbench is not detecting my pre-trimmed reads.

I am wanting to be able to input pre-trimmed reads as there are a few other process I need to perform on them first so unfortunately getting sRNAbench to trim isn't a good option for my reads :(

Thanks so much for your work with sRNAbench!

paulamool commented 1 year ago

I had a similar outcome with trimmed reads using the latest docker image. If I set parameter holdNonAdapter=true the bulk of the reads are processed.

No. raw input reads: 18449059
 ...
 ALIGN (PRE-PROCESSED) READS TO THE GENOME              
             Mapped 10200919 reads and 2817642 unique reads to the genomes(s)

sert23 commented 1 year ago

Thanks for posting this issue. Can you share your pre-trimmed data? If the fastq input is not as expected by sRNAbench, this will not work. You should be able to see this (some kind of error) in a command line run (docker). If using a collapsed fasta, standard sRNAbench's format applies (check reads_orig.fa). I hope this helps and let us know if we can help further.

Ernesto

bioinfoUGR / sRNAtoolbox

Low number of raw input reads detected #18