Out of memory in qtip-parse

johanneskoester commented 6 years ago

I managed to run qtip with BWA (see my PR). But now I get this error:

11/17/17-12:30:57:INFO:  running "/export/scratch2/koster/qtip/qtip-parse ifs --  -- mapped-qtip/tmp5m0q3174/input_alignments/tmp -- /export/scratch2/koster/data/ref/hg18.fasta -- mapped-qtip/tmp5m0q3174/input_intermediates/tmp -- mapped-qtip/tmp5m0q3174/tandem_intermediates/tmpinp"
Parsing SAM file "mapped-qtip/tmp5m0q3174/input_alignments/tmp" (seed=0)
Traceback (most recent call last):
  File "/export/scratch1/home/koster/.local/opt/miniconda3/envs/qtip/bin/qtip", line 1091, in <module>
    go_profile(vars(_args), _aligner_args, _aligner_unpaired_args, _aligner_paired_args)
  File "/export/scratch1/home/koster/.local/opt/miniconda3/envs/qtip/bin/qtip", line 1042, in go_profile
    go(args, aligner_args, aligner_unpaired_args, aligner_paired_args)
  File "/export/scratch1/home/koster/.local/opt/miniconda3/envs/qtip/bin/qtip", line 514, in go
    _do_parse_input_sam()
  File "/export/scratch1/home/koster/.local/opt/miniconda3/envs/qtip/bin/qtip", line 480, in _do_parse_input_sam
    raise RuntimeError("qtip-parse returned %d" % ret)
RuntimeError: qtip-parse returned 9

The BAMs are looking ok (in terms of ZT:Z). However, exit code 9 usually means killing due to out of memory problems. Do you have numbers how much memory qtip-parse would need? Or might it have a memory leak?

johanneskoester commented 6 years ago

I'm quite sure it is a memory leak. Memory usage is growing monotonically.

johanneskoester commented 6 years ago

I take that back. I have tested qtip-parse with valgrind. There is no memory leak:

==11741== 
==11741== HEAP SUMMARY:
==11741==     in use at exit: 0 bytes in 0 blocks
==11741==   total heap usage: 252 allocs, 252 frees, 185,753 bytes allocated
==11741== 
==11741== All heap blocks were freed -- no leaks are possible
==11741== 
==11741== For counts of detected and suppressed errors, rerun with: -v
==11741== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Maybe it has just been something temporal on my server. I will close this for now.

johanneskoester commented 6 years ago

Ok, I had a closer look. According to valgrind, there are no memleaks. However, qtip-parse is clearly killed because it uses the entire system memory (250GB!!):

[62697.863753] Out of memory: Kill process 27262 (qtip-parse) score 981 or sacrifice child
[62697.863798] Killed process 27262 (qtip-parse) total-vm:267307312kB, anon-rss:258564188kB, file-rss:0kB, shmem-rss:0kB

This is quite urgent for me, and I am happy to fix the code if you could give me a pointer to where that much memory is used in qtip-parse.

johanneskoester commented 6 years ago

I think the template vectors are the problem. With huge bam files, these are a lot. @BenLangmead I see two solutions:

add a sampling factor
do not store them. If I am not mistaken, they are never used except for estimating fragment length in the model constructors?

johanneskoester commented 6 years ago

Seems like this can be already controlled with --input-model-size? I will try this out.

BenLangmead commented 6 years ago

This is very odd. qtip-parse was designed to avoid having any large in-memory data structures. While it's true that the number of tandem reads has to grow with the input size, that should only result in increased disk footprint, not memory footprint (and even that increase is sublinear in # of input reads). All the containers in qtip-parse, the input-model reservoir samplers being the most significant, have hard ceilings set to low values by default. I've done many many tests and not seen this behavior. I'll let you know if I think of anything, but any details about which data structures are responsible would be helpful. E.g. using a debugger to catch memory allocation failures is one quick thing to try.

johanneskoester commented 6 years ago

I think I have discovered the problem. The problem is the parameter input_model_size.

in qtip-parse, you initialize it with the maximum integer value in line 798.
in qtip, you have a command line argument to set it, which has a default value of 30000. However, this command line argument is only passed to qtip-parse if the user makes explicit use of it, because otherwise it won't occur in the args variable here. Hence, input_model_size remains at the maximum (far too large) size if the argument is not set at the command line.

BenLangmead commented 6 years ago

qtip should always set input-model-size when invoking qtip-parse. If you're running qtip-parse directly, then that's the problem. It's only designed to be run directly by qtip.

johanneskoester commented 6 years ago

No, I did not run it directly. As I have outlined above, it seems like the argument is not passed through to qtip-parse.

BenLangmead commented 6 years ago

Could you send the full output also using --verbose when calling qtip?

BenLangmead commented 6 years ago

It looks like qtip is failing to pass the "passthrough args" to the utility programs, which might speak to a failure in _get_passthrough_args

johanneskoester commented 6 years ago

Here comes the entire log. No --verbose though, because I cannot computationally afford to rerun this again. simulated.tumor.hg18.log

BenLangmead commented 6 years ago

Ah. The message(s) I was hoping to see (or not see) are printed only when DEBUG level logging is enabled via --verbose: https://github.com/BenLangmead/qtip/blob/master/qtip#L363. I will make a note to try to reproduce this locally, but might not be till semester/teaching finishes in a couple weeks. If you ever have a chance to do a less expensive run (maybe on subset of data), then please try with --verbose and send.

BenLangmead commented 6 years ago

As an example, here's what the INFO message about running qtip-parse looks like when I run it locally on the test data:

11/28/17-08:07:00:INFO:  running "/Users/langmead/git/qtip/qtip-parse ifs -- wiggle 30 input-model-size 30000 max-allowed-fraglen 100000 sim-factor 45.0 sim-function sqrt sim-unp-min 30000 sim-conc-min 30000 sim-disc-min 10000 sim-bad-end-min 10000 seed 1712975108 -- full_e2e/input.sam -- lambda_virus.fa -- full_e2e/input_intermediates -- full_e2e/tandem_intermediates"

Everything between the first and second -- is missing for your run. Those are the "pass-through" arguments, where qtip is passing along some of its args.

johanneskoester commented 6 years ago

Ok, you have convinced me :-). Here comes the output for a very small BAM: test.normal.hg18.log

The corresponding command was (with --verbose):

qtip --bwa-exe 'resources/bwa mem -Y -R "@RG\\tID:normal\\tSM:normal" -t 1' --output-directory mapped-qtip/test.normal.hg18 --temp-directory mapped-qtip --verbose --aligner bwa-mem --m1 reads/test.normal.1.fastq --m2 reads/test.normal.2.fastq --index index/hg18/genome --ref /export/scratch2/koster/data/ref/hg18.fasta

BenLangmead commented 6 years ago

Thanks for the fix!

BenLangmead / qtip

Out of memory in qtip-parse #6