ExaScience / elprep

elPrep: a high-performance tool for analyzing sequence alignment/map files in sequencing pipelines.
Other
287 stars 40 forks source link

Variant calling using `sfm` does not adhere to thread limit #56

Open matthdsm opened 3 years ago

matthdsm commented 3 years ago

Hi,

I noticed that when using 'sfm' mode, elprep uses more resources than allowed (e.g. all available threads instead of the given thread count). When using filter, this does not occur.

Cheers Matthias

caherzee commented 3 years ago

Hi Matthias,

How are you observing that more threads are used by elprep?

(I double checked the source code, which seems fine, and I cannot reproduce the behaviour).

Thanks! Charlotte

matthdsm commented 3 years ago

I was monitoring CPU usage through htop and saw the entire node was being used (full cpu, all cores), despite only having asked 35 threads (instead of 80).

I'll rerun the analysis and see of this occurs again. Matthias

caherzee commented 3 years ago

If possible, can you run your command with /usr/bin/time -v (e.g. /usr/bin/time -v elprep sfm input.bam output.bam) and send me the output?

Thanks! Charlotte

matthdsm commented 3 years ago

Good thing I'm benchmarking right now!

Command being timed: "elprep sfm NA12878.bam NA12878.elprep.bam --nr-of-threads 35 --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --sorting-order coordinate --haplotypecaller NA12878.elprep.g.vcf.gz --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --known-sites dbsnp-151.elsites --reference-confidence GVCF --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --log-path /projects/demultiplex/development/elprep --timed"
User time (seconds): 74572.83
System time (seconds): 2917.64
Percent of CPU this job got: 2323%
Elapsed (wall clock) time (h:mm:ss or m:ss): 55:35.31
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
-> RAM USAGE
Maximum resident set size (kbytes): 40641948 = 40GB
<-
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 352416596
Voluntary context switches: 51149677
Involuntary context switches: 2166519
Swaps: 0
File system inputs: 0
File system outputs: 55015096
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
caherzee commented 3 years ago

Hmm that seems ok. Any chance the extra threads you see are not from the elprep command?

matthdsm commented 3 years ago

Elprep was the only thing running on the node at that moment. I'm currently running the same sample on 10 threads. At this point the spike hasn't occured. I'll get back to you in an hour with the results.

matthdsm commented 3 years ago

The spike occurred again, staring with "Phase 2" of the process Schermafbeelding 2021-10-21 om 14 56 35

matthdsm commented 3 years ago

Elprep log

elprep version 5.1.1 compiled with go1.16.7 - see http://github.com/exascience/elprep for more information.
2021/10/21 12:07:49 Created log file at /projects/demultiplex/development/elprep/logs/elprep/elprep-2021-10-21-12-07-49-573422852-UTC.log
2021/10/21 12:07:49 Command line: [elprep sfm NA12878.bam NA12878.elprep.bam --nr-of-threads 10 --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --sorting-order coordinate --haplotypecaller NA12878.elprep.g.vcf.gz --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --known-sites dbsnp-151.elsites --reference-confidence GVCF --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --log-path /projects/demultiplex/development/elprep --timed]
2021/10/21 12:07:49 Executing command:
 elprep sfm NA12878.bam NA12878.elprep.bam --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --quantize-levels 0 --max-cycle 500 --known-sites dbsnp-151.elsites --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --haplotypecaller NA12878.elprep.g.vcf.gz --sorting-order coordinate --nr-of-threads 10 --timed --log-path /projects/demultiplex/development/elprep --intermediate-files-output-prefix NA12878 --intermediate-files-output-type bam
2021/10/21 12:07:49 Splitting...
2021/10/21 12:12:55 Filtering (phase 1)...
2021/10/21 12:55:34 Filtering (phase 2) and variant calling...
2021/10/21 13:14:29 Merging sam/bam...
2021/10/21 13:18:42 Merging metrics...
2021/10/21 13:18:42 Merge vcf...

time log

Command being timed: "elprep sfm NA12878.bam NA12878.elprep.bam --nr-of-threads 10 --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --sorting-order coordinate --haplotypecaller NA12878.elprep.g.vcf.gz --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --known-sites dbsnp-151.elsites --reference-confidence GVCF --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --log-path /projects/demultiplex/development/elprep --timed"
        User time (seconds): 67783.27
        System time (seconds): 2824.52
        Percent of CPU this job got: 1659%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:10:53
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 45212524
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 391376706
        Voluntary context switches: 41522383
        Involuntary context switches: 1930008
        Swaps: 0
        File system inputs: 0
        File system outputs: 55015496
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
caherzee commented 3 years ago

I think the problem might be there are too many system calls blocked in this specific case. It seems that the go runtime continues to launch threads in that case, though only 10 goroutines (threads) should actually ever be running in parallel. So if I understand it correctly, it should not be an issue so many threads are spawned, and it is why we don't see a bigger cpu usage than expected. I am unsure if I can prevent the go runtime to do this. I have to look into this a bit deeper.

matthdsm commented 3 years ago

I'm currently re-running with tmp path set to a faster FS, that should help the blocked calls, no?

matthdsm commented 3 years ago

No dice, same result, but to a lesser extent. A slow(ish) FS seems to be the culprit in this case

caherzee commented 3 years ago

That makes sense. Are you restricting the nr of threads to run multiple jobs on the server?

matthdsm commented 3 years ago

Currently, yes. I'm running the same sample with different parameters to find the sweet spot between performance and speed specific to our hardware. Atm I've noticed that throwing more resources at elprep does not scale linearly, with only about 2min speed gains with double the resources.