Closed Thomieh73 closed 4 years ago
In contrast to the fastq process the non-pareil process creates the files due to the gunzip command.
Non-pareil can only use uncompressed datafiles. Because of it needs to have a fastq file instead of the fastq.gz files. unpacking the compressed file, create a file with the group label of the original file. That really ate up a lot of space on our allocation.
the solution was to do the following;
echo only processing file: ${reads[0]}
gunzip -c ${reads[0]} > forward_reads.fastq
nonpareil -s forward_reads.fastq -T kmer -f fastq -b ${sample_id}_R1 \
-X ${params.query} -n ${params.subsample} -t $task.cpus
#cleanup area
rm -r forward_reads.fastq
That unpackes the archive, creates a new file, that after processing is deleted again.
By storing the complete dataset on the workfolders it eats up cosiderable disk space which should not be needed. I will try to test this out.