NorwegianVeterinaryInstitute / Talos

A shotgun metagenomic analysis pipeline using nextflow
BSD 3-Clause "New" or "Revised" License
1 stars 2 forks source link

Check out why nonpareil has problem with symlinks #34

Closed Thomieh73 closed 4 years ago

Thomieh73 commented 4 years ago

By storing the complete dataset on the workfolders it eats up cosiderable disk space which should not be needed. I will try to test this out.

Thomieh73 commented 4 years ago

In contrast to the fastq process the non-pareil process creates the files due to the gunzip command.

Non-pareil can only use uncompressed datafiles. Because of it needs to have a fastq file instead of the fastq.gz files. unpacking the compressed file, create a file with the group label of the original file. That really ate up a lot of space on our allocation.

the solution was to do the following;

echo only processing file: ${reads[0]}

    gunzip -c ${reads[0]} > forward_reads.fastq

    nonpareil -s forward_reads.fastq -T kmer -f fastq -b ${sample_id}_R1 \
     -X ${params.query} -n ${params.subsample} -t $task.cpus

     #cleanup area
     rm -r forward_reads.fastq

That unpackes the archive, creates a new file, that after processing is deleted again.