MikkelSchubert / adapterremoval

AdapterRemoval v2 - rapid adapter trimming, identification, and read merging
http://adapterremoval.readthedocs.io/
GNU General Public License v3.0
102 stars 23 forks source link

No result after passing in a sample with no adapters #30

Closed billytaj closed 5 years ago

billytaj commented 5 years ago

Hi, I'm trying to use your tool on a dataset that has no adapters. However, the output of the program is a completely blank Fastq. Shouldn't it leave my file alone, if there are no adapters?

The data I used is from here: http://huttenhower.sph.harvard.edu/humann2 Their synthetic human gut rna sample. I am using factory default settings, and fastqc tells me this sample has no adapters.

MikkelSchubert commented 5 years ago

Hi,

That sounds odd. Can you try to copy/paste the exact command you used to run AdapterRemoval?

While I would not expect the output to be an empty file, running AdapterRemoval on a file without adapter sequences will still cause changes. The algorithm used by AdapterRemoval does not have perfect specificity, so false positives are to be expected. That said, I am only seeing a few potentially false positives when I run AdapterRemoval on those files.

Best regards, Mikkel

On Tue, Dec 18, 2018 at 6:22 PM Billy Taj notifications@github.com wrote:

Hi, I'm trying to use your tool on a dataset that has no adapters. However, the output of the program is a completely blank Fastq. Shouldn't it leave my file alone, if there are no adapters?

The data I used is from here: http://huttenhower.sph.harvard.edu/humann2 Their synthetic human gut rna sample. I am using factory default settings, and fastqc tells me this sample has no adapters.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MikkelSchubert/adapterremoval/issues/30, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTMa2RQdPTZ7vIueT3QkBtcJ_s3tOPnks5u6SRlgaJpZM4ZYviN .

billytaj commented 5 years ago

>&2 echo Removing adapters | /pipeline_tools/adapterremoval/AdapterRemoval --file1 /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/0_sorted_raw_input/pair_1_sorted.fastq --file2 /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/0_sorted_raw_input/pair_2_sorted.fastq --qualitybase 33 --threads 80 --minlength 30 --basename /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/1_adapter_removal_AdapterRemoval --trimqualities --output1 /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/1_adapter_removal/pair_1_adptr_rem.fastq --output2 /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/1_adapter_removal/pair_2_adptr_rem.fastq --singleton /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/1_adapter_removal/singletons_adptr_rem.fastq

MikkelSchubert commented 5 years ago

That looks fine, though I don't think you'll gain much from using 80 threads.

You said that the output "is a completely blank Fastq". Does that apply to all of the resulting FASTQ files? That is to say, are pair_1_adptr_rem.fastq, pair_2_adptr_rem.fastq, and singletons_adptr_rem.fastq all empty?

On Tue, Dec 18, 2018 at 7:29 PM Billy Taj notifications@github.com wrote:

&2 echo Removing adapters | /pipeline_tools/adapterremoval/AdapterRemoval --file1 /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/0_sorted_raw_input/pair_1_sorted.fastq --file2 /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/0_sorted_raw_input/pair_2_sorted.fastq --qualitybase 33 --threads 80 --minlength 30 --basename /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/1_adapter_removal_AdapterRemoval --trimqualities --output1 /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/1_adapter_removal/pair_1_adptr_rem.fastq --output2 /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/1_adapter_removal/pair_2_adptr_rem.fastq --singleton /scratch/j/jparkin/billyc59/Humann2_benchmark_run/rna_synth/quality_filter/data/1_adapter_removal/singletons_adptr_rem.fastq

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MikkelSchubert/adapterremoval/issues/30#issuecomment-448321830, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTMa9AC1e5abXvggk2hOkADIkBn0XgVks5u6TQPgaJpZM4ZYviN .

billytaj commented 5 years ago

Threads: What are the parallelism limits on this program? I am using an 80-thread core machine to run your program.

outputs: yes, pair_1_adptr_rem.fastq, pair_2_adptr_rem.fastq and singletons_adptr_rem.fastq are completely blank.

Are they blank for you too?

MikkelSchubert commented 5 years ago

There are no hard-coded limits, but most of those threads will probably end doing little more than waiting for the next chunk of FASTQ reads to be read and (subsequently) written.

Output looks fine for me, using the same options that you did, with the output file only being slightly smaller than the input (due to the aforementioned false positives).

I should have asked this earlier, but can you copy/paste or attach the STDERR output from AdapterRemoval? Also, what version are you using? See 'AdapterRemoval --version'.

On Tue, Dec 18, 2018 at 7:56 PM Billy Taj notifications@github.com wrote:

Threads: What are the parallelism limits on this program? I am using an 80-thread core machine to run your program.

outputs: yes, pair_1_adptr_rem.fastq, pair_2_adptr_rem.fastq and singletons_adptr_rem.fastq are completely blank.

Are they blank for you too?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MikkelSchubert/adapterremoval/issues/30#issuecomment-448330510, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTMay_IggoGaOpvHoas7MPhpMVwjYuHks5u6TpYgaJpZM4ZYviN .

billytaj commented 5 years ago

version: version 2.1.7. Is there a newer one?

Oh, I see. it's a malformed header Trimming paired end reads ... Error reading FASTQ record at line 1; aborting: Malformed or empty FASTQ header

billytaj commented 5 years ago

I don't really understand this error. Does this program need "/1" and "/2" at the end of each fastq ID for paired-end mode?

MikkelSchubert commented 5 years ago

No, the /1 and /2 are not required. But if they are there, then they just have to make sense (i.e. a 1 and a 2). However, this particular error message is caused by the header line either being empty or not starting with '@'.

Try to 'head' your input files and let me know what the result is?

On Tue, Dec 18, 2018 at 8:30 PM Billy Taj notifications@github.com wrote:

I don't really understand this error. Does this program need "/1" and "/2" at the end of each fastq ID for paired-end mode?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MikkelSchubert/adapterremoval/issues/30#issuecomment-448341102, or mute the thread https://github.com/notifications/unsubscribe-auth/ACTMazYFFl8ERb40IhDFltiEMCOLKcXyks5u6UJDgaJpZM4ZYviN .

billytaj commented 5 years ago

The fault is with my own code, and not an issue with yours. Thank you again for your help. My particular issue is due to a Pandas import error making a mess of my input file.