Closed TCLamnidis closed 3 years ago
A bit of extra context: I am trying to remove adapters in one step and trim and collapse in another because I want to use the demultiplexing functionality of AR to deal with internal barcodes in the dataset. It makes sense to me to do that BEFORE collapsing the reads, but it cannot be done before removing the adapters.
The problem is that you are using the same --basename
in both your commands, which means that the second command both tries to read from CS01.pe.pair*.truncated, while also writing read-pairs that were not merged to those same files.
Files are opened for writing in a lazy manner (part of the support for file handle limits needed while demultiplexing many samples), so AdapterRemoval manages to read a bit of the files before producing output that is then written back to the same files, truncating them in the process.
You could modify your commands as follows to avoid this problem:
$ AdapterRemoval --file1 ../ERR3003613_1.fastq.gz --file2 ../ERR3003613_2.fastq.gz --basename step1.CS01.pe \
--adapter1 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC' \
--adapter2 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' --minadapteroverlap 1
$ AdapterRemoval --file1 step1.CS01.pe.pair1.truncated --file2 step1.CS01.pe.pair2.truncated --basename step2.CS01.pe --qualitymax 41 --trimns --trimqualities --minlength 30 --minquality 20 --collapse
With that out of the way, I am not clear on your motivation for doing this. Why can you not demultiplex the reads before removing the adapters? If the barcodes are located at the 3' end of reads, then you cannot use AdapterRemoval to demultiplex the reads like you say you want to, and if the barcodes are located as the 5' then AdapterRemoval already handles demultiplexing, adapter (and complementary barcode) trimming, and merging in the, to my knowledge, correct order.
Thank you for the clarification! I will retry with a different basename.
The motivation for doing this is linked to https://github.com/MikkelSchubert/adapterremoval/issues/50, dealing with sample-specific barcodes that come AFTER the adapter sequence.
Hi @MikkelSchubert !
I am looking for a sensible way to separate the adapter clipping functionality of AR from the collapsing functionality, and have run into an odd behaviour.
I am using some public data from the ENA, downloadable here: https://www.ebi.ac.uk/ena/browser/view/PRJEB30331 The md5sums match those of the ENA. I am using
version 2.3.2
off bioconda.I started out by removing the adapters from the fastqs without any filtering or trimming.
The resulting files look fine.
I then try to collapse, trim and filter the adapter clipped files:
I then checked the input files again:
After multiple tries, it seems that the line at which the error is thrown changes, but it is always 600 lines that remain in the input files.