bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
93 stars 20 forks source link

Potential memory issue (createDB) #41

Closed moorembioinfo closed 5 years ago

moorembioinfo commented 5 years ago

Hi!

This might be a problem my end but I seem to be having a memory issue when --easy-run or --createdb get to the calculate core and accessory distances stage. I've successfully run --easy-run on ~70 cdiff genomes but increasing to much more (200+) gives an error (end of this message).

When I'm running --easy-run I'm using: poppunk --easy-run --r-files reference_list.txt --output lm_example --full-db --min-k 15

And I've tried this command for createdb: poppunk --create-db --r-files reference_list.txt --output poppunk_db --k-step 2 --min-k 20 --plot-fit 5 Also threaded and also both with --no-stream

I've also provided as much as 300Gb memory for the run

Thanks in advance for any help with this!

Calculating core and accessory distances Traceback (most recent call last): File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 415, in get return Q.get(timeout=1) File "/well/bag/moorem/anaconda/lib/python3.6/multiprocessing/queues.py", line 105, in get raise Empty queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 423, in get return Q.get(timeout=0) File "/well/bag/moorem/anaconda/lib/python3.6/multiprocessing/queues.py", line 105, in get raise Empty queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 757, in map capsule = pg.get(R) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 425, in get raise StopProcessGroup sharedmem.sharedmem.StopProcessGroup: StopProcessGroup

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/well/bag/moorem/anaconda/bin/poppunk", line 11, in sys.exit(main()) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/PopPUNK/main.py", line 210, in main args.plot_fit, args.no_stream, args.mash, args.threads) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/PopPUNK/mash.py", line 520, in queryDatabase pool.map(partial(fitKmerBlock, distMat=distMat, raw = raw, klist=klist, jacobian=jacobian), mat_chunks) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 761, in map raise pg.get_exception() sharedmem.sharedmem.SlaveException: Residuals are not finite in the initial point. Traceback (most recent call last): File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 294, in _slaveMain self.main(self, self.args) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 628, in _main r = realfunc(work) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 703, in realfunc else: return func(i) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/PopPUNK/mash.py", line 543, in fitKmerBlock distMat[start:end, :] = np.apply_along_axis(fitKmerCurve, 1, raw[start:end, :], klist, jacobian) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 380, in apply_along_axis buff[ind] = asanyarray(func1d(inarr_view[ind], args, **kwargs)) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/PopPUNK/mash.py", line 570, in fitKmerCurve bounds=([-np.inf, -np.inf], [0, 0])) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/scipy/optimize/_lsq/least_squares.py", line 804, in least_squares raise ValueError("Residuals are not finite in the initial point.") ValueError: Residuals are not finite in the initial point.

johnlees commented 5 years ago

I don't think there should be a memory issue, especially with that small of a dataset. Although the errors look like they are coming from the sharedmem package, I think it may actually be one of the regressions (that are run in parallel by sharedmem) failing -- causing the ValueError at the end.

Could you try running on the larger dataset with --min-k 11? When you ran on the smaller dataset did the plots of the regressions look ok?

moorembioinfo commented 5 years ago

Thanks for the quick response! On the much smaller dataset (~70 genomes) the regressions looked fine.

When I run smaller than --min-k 15 I get an error.

I've rerun on k15 without --plot-fit for 319 genomes and that gives the same error

Is there anything else I could do that would be informative? I saw in the documentation that you've successfully tested many more genomes

moorembioinfo commented 5 years ago

I've just checked my genomes and one was extremely low quality (not a genome at all but a few contigs) so it seems to have been falling over on that! It's run without issue now!

Thanks again

johnlees commented 5 years ago

Ok, great! Your issue has been really helpful though, as I now realise that I should add proper error handling in for that case so you can identify the troublesome sample without a nasty wall of text about memory appearing

nickjcroucher commented 5 years ago

If it's difficult to catch at the curve fitting stage, would it be helpful to add a warning if a sequence is an outlier in terms of length? e.g. below half, or above twice, the mean? That would help the user identify the problems with the curve fitting stage perhaps?

johnlees commented 5 years ago

Some sort of basic QC does seem like a good idea - I'll raise a new issue

moorembioinfo commented 5 years ago

Glad it was helpful! This particular case was egregious in that the offending genome was a few Kbp rather than ~4Mbp

johnlees commented 5 years ago

This still doesn't give a useful error message, because I didn't test this properly! ValueError has no message to look for:

Traceback (most recent call last):
  File "/root/miniconda3/bin/poppunk", line 10, in <module>
    sys.exit(main())
  File "/root/miniconda3/lib/python3.7/site-packages/PopPUNK/__main__.py", line 399, in main
    args.no_stream, args.mash, args.threads)
  File "/root/miniconda3/lib/python3.7/site-packages/PopPUNK/mash.py", line 549, in queryDatabase
    pool.map(partial(fitKmerBlock, distMat=distMat, raw = raw, klist=klist, jacobian=jacobian), mat_chunks)
  File "/root/miniconda3/lib/python3.7/site-packages/sharedmem/sharedmem.py", line 761, in map
    raise pg.get_exception()
sharedmem.sharedmem.SlaveException: 'ValueError' object has no attribute 'message'
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.7/site-packages/PopPUNK/mash.py", line 600, in fitKmerCurve
    bounds=([-np.inf, -np.inf], [0, 0]))
  File "/root/miniconda3/lib/python3.7/site-packages/scipy/optimize/_lsq/least_squares.py", line 804, in least_squares
    raise ValueError("Residuals are not finite in the initial point.")
ValueError: Residuals are not finite in the initial point.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.7/site-packages/sharedmem/sharedmem.py", line 294, in _slaveMain
    self.main(self, *self.args)
  File "/root/miniconda3/lib/python3.7/site-packages/sharedmem/sharedmem.py", line 628, in _main
    r = realfunc(work)
  File "/root/miniconda3/lib/python3.7/site-packages/sharedmem/sharedmem.py", line 703, in realfunc
    else: return func(i)
  File "/root/miniconda3/lib/python3.7/site-packages/PopPUNK/mash.py", line 572, in fitKmerBlock
    distMat[start:end, :] = np.apply_along_axis(fitKmerCurve, 1, raw[start:end, :], klist, jacobian)
  File "/root/miniconda3/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 380, in apply_along_axis
    res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))
  File "/root/miniconda3/lib/python3.7/site-packages/PopPUNK/mash.py", line 605, in fitKmerCurve
    np.array2string(pairwise, precision=4, separator=',',suppress_small=True) +
AttributeError: 'ValueError' object has no attribute 'message'
johnlees commented 5 years ago

Fixed in 6e8f04e1f1d7bdc8974f17eb09a1ac72fb11efc7