Open mmokrejs opened 6 years ago
This part is normally I/O bound, so multiple threads would make the situation even worse.
We have a parallel filesystem (LustreFS) served by I think 54 working slave machines, and infiniband inbetween. How are the data laid over the many hosts and drives is user configurable per directory or even per file. Stripe size is currently 1MB I think.
And would I be sure the data fits into memory, I would use ramdisk for the actual processing and then move the resulting files into storage filesystem. Oh yes, it does:
$ du -sh mygenome__SPAdes3.11.1_noecc/.bin_reads/
56G mygenome__SPAdes3.11.1_noecc/.bin_reads/
$
The input uncompressed FASTQ files occupied 435.86GB.
Here you can see the "disc" traffic is 102MB/s on average, more reading than writing.
112 x86_64 Intel(R) Xeon(R) CPU E5-4627 v2 @ 3.30GHz
are available with 3.2TB physical, local RAM
Here you can see the "disc" traffic is 104MB/s on average, more reading than writing.
This is how it should be. We're reading FASTQ (text format) and convert to the internal binary format. The read:write ratio 9:1 is very close to the text FASTQ : SPAdes binary format file size ratio.
Here is what the filesystem handles if applications are properly written to read/write in large chunks. A very efficient alternative. bamsort comes from https://github.com/gt1/biobambam2
# samtools sort of a 149GB BAM file takes 1.2TB RAM and uses only a single thread despite '-@ 15' argument
# samtools sort -@ $xthreads -m "$gb_mem_per_thread"G -O bam -T "$1" -o "$2".sorted.bam "$2".bam || exit 255
#
# bamsort comes from https://github.com/gt1/biobambam2
LIBMAUS2_POSIXFDINPUT_BLOCKSIZE_OVERRIDE==1m
export LIBMAUS2_POSIXFDINPUT_BLOCKSIZE_OVERRIDE
bamsort SO=coordinate blockmb="$take_memory" inputthreads="$input_threads" outputthreads="$output_threads" level=9 index=1 I="$2".bam O="$2".sorted.bam
The currently running SPAdes process running read_converter.hpp/binary_converter.hpp supposedly overloaded metadata servers of LustreFS and the kernel after 40minutes of attempts to flush buffers (see high system CPU load in red color in figures below) gave up. I see similar issues when apps write many and too small chunks appending to existing files. Running truss
or strace
or similar profiling tool should reveal the actual write size of SPAdes binaries.
I cannot login to the cluster node to verify this but although I am running spades.py --tmp-dir /ramdisk/$PBS_JOBID
it seems it is still reading and writing at same pace to LustreFS (~100 kBps). Although I do not see any improvements in terms of the times how quickly spades.py moves to process the many input FASTA files.
And, while the log says now:
0:46:19.694 12M / 700M INFO General (read_converter.hpp : 84) Converting reads to binary format for library #6 (takes a while)
I should not see the paired_6_*.seq
files on the networked filesystem until this step is complete, right? They should be still in --tmp-dir
.
These files will be in the output dir since they are reused across iterations (= long living). Everything else will be on scratch.
I don't understand. The paired_6_*.seq
have same modification timestamp because they were continually updated for some while during processing the library #6 of input files. This should have happened in --tmp-dir and then the paired_6_*.seq
files should have been moved to tt_16D1C3L12__SPAdes3.11.1_noecc_ramdisk/.bin_reads/
. But, until library #7 processing started these files should not be existing in tt_16D1C3L12__SPAdes3.11.1_noecc_ramdisk/.bin_reads/
, so what am I missing?
This is not how it is done currently. We may consider doing this in some next SPAdes versions. Patches are always welcome though.
Hi, although I provided 19 input files the code run in a single thread. To further scale it could also do the conversion in multiple chunks on each file?
This probably won't happen soon but let me open a feature request for this. Current version is SPAdes3.11.1. Thank you.