Open matthdsm opened 3 years ago
Hi Matthias,
How are you observing that more threads are used by elprep?
(I double checked the source code, which seems fine, and I cannot reproduce the behaviour).
Thanks! Charlotte
I was monitoring CPU usage through htop
and saw the entire node was being used (full cpu, all cores), despite only having asked 35 threads (instead of 80).
I'll rerun the analysis and see of this occurs again. Matthias
If possible, can you run your command with /usr/bin/time -v (e.g. /usr/bin/time -v elprep sfm input.bam output.bam) and send me the output?
Thanks! Charlotte
Good thing I'm benchmarking right now!
Command being timed: "elprep sfm NA12878.bam NA12878.elprep.bam --nr-of-threads 35 --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --sorting-order coordinate --haplotypecaller NA12878.elprep.g.vcf.gz --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --known-sites dbsnp-151.elsites --reference-confidence GVCF --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --log-path /projects/demultiplex/development/elprep --timed"
User time (seconds): 74572.83
System time (seconds): 2917.64
Percent of CPU this job got: 2323%
Elapsed (wall clock) time (h:mm:ss or m:ss): 55:35.31
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
-> RAM USAGE
Maximum resident set size (kbytes): 40641948 = 40GB
<-
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 352416596
Voluntary context switches: 51149677
Involuntary context switches: 2166519
Swaps: 0
File system inputs: 0
File system outputs: 55015096
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Hmm that seems ok. Any chance the extra threads you see are not from the elprep command?
Elprep was the only thing running on the node at that moment. I'm currently running the same sample on 10 threads. At this point the spike hasn't occured. I'll get back to you in an hour with the results.
The spike occurred again, staring with "Phase 2" of the process
Elprep log
elprep version 5.1.1 compiled with go1.16.7 - see http://github.com/exascience/elprep for more information.
2021/10/21 12:07:49 Created log file at /projects/demultiplex/development/elprep/logs/elprep/elprep-2021-10-21-12-07-49-573422852-UTC.log
2021/10/21 12:07:49 Command line: [elprep sfm NA12878.bam NA12878.elprep.bam --nr-of-threads 10 --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --sorting-order coordinate --haplotypecaller NA12878.elprep.g.vcf.gz --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --known-sites dbsnp-151.elsites --reference-confidence GVCF --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --log-path /projects/demultiplex/development/elprep --timed]
2021/10/21 12:07:49 Executing command:
elprep sfm NA12878.bam NA12878.elprep.bam --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --quantize-levels 0 --max-cycle 500 --known-sites dbsnp-151.elsites --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --haplotypecaller NA12878.elprep.g.vcf.gz --sorting-order coordinate --nr-of-threads 10 --timed --log-path /projects/demultiplex/development/elprep --intermediate-files-output-prefix NA12878 --intermediate-files-output-type bam
2021/10/21 12:07:49 Splitting...
2021/10/21 12:12:55 Filtering (phase 1)...
2021/10/21 12:55:34 Filtering (phase 2) and variant calling...
2021/10/21 13:14:29 Merging sam/bam...
2021/10/21 13:18:42 Merging metrics...
2021/10/21 13:18:42 Merge vcf...
time
log
Command being timed: "elprep sfm NA12878.bam NA12878.elprep.bam --nr-of-threads 10 --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --sorting-order coordinate --haplotypecaller NA12878.elprep.g.vcf.gz --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --known-sites dbsnp-151.elsites --reference-confidence GVCF --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --log-path /projects/demultiplex/development/elprep --timed"
User time (seconds): 67783.27
System time (seconds): 2824.52
Percent of CPU this job got: 1659%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:10:53
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 45212524
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 391376706
Voluntary context switches: 41522383
Involuntary context switches: 1930008
Swaps: 0
File system inputs: 0
File system outputs: 55015496
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I think the problem might be there are too many system calls blocked in this specific case. It seems that the go runtime continues to launch threads in that case, though only 10 goroutines (threads) should actually ever be running in parallel. So if I understand it correctly, it should not be an issue so many threads are spawned, and it is why we don't see a bigger cpu usage than expected. I am unsure if I can prevent the go runtime to do this. I have to look into this a bit deeper.
I'm currently re-running with tmp path set to a faster FS, that should help the blocked calls, no?
No dice, same result, but to a lesser extent. A slow(ish) FS seems to be the culprit in this case
That makes sense. Are you restricting the nr of threads to run multiple jobs on the server?
Currently, yes. I'm running the same sample with different parameters to find the sweet spot between performance and speed specific to our hardware. Atm I've noticed that throwing more resources at elprep does not scale linearly, with only about 2min speed gains with double the resources.
Hi,
I noticed that when using 'sfm' mode, elprep uses more resources than allowed (e.g. all available threads instead of the given thread count). When using
filter
, this does not occur.Cheers Matthias