Closed moorembioinfo closed 5 years ago
I don't think there should be a memory issue, especially with that small of a dataset. Although the errors look like they are coming from the sharedmem package, I think it may actually be one of the regressions (that are run in parallel by sharedmem) failing -- causing the ValueError
at the end.
Could you try running on the larger dataset with --min-k 11
? When you ran on the smaller dataset did the plots of the regressions look ok?
Thanks for the quick response! On the much smaller dataset (~70 genomes) the regressions looked fine.
When I run smaller than --min-k 15 I get an error.
I've rerun on k15 without --plot-fit for 319 genomes and that gives the same error
Is there anything else I could do that would be informative? I saw in the documentation that you've successfully tested many more genomes
I've just checked my genomes and one was extremely low quality (not a genome at all but a few contigs) so it seems to have been falling over on that! It's run without issue now!
Thanks again
Ok, great! Your issue has been really helpful though, as I now realise that I should add proper error handling in for that case so you can identify the troublesome sample without a nasty wall of text about memory appearing
If it's difficult to catch at the curve fitting stage, would it be helpful to add a warning if a sequence is an outlier in terms of length? e.g. below half, or above twice, the mean? That would help the user identify the problems with the curve fitting stage perhaps?
Some sort of basic QC does seem like a good idea - I'll raise a new issue
Glad it was helpful! This particular case was egregious in that the offending genome was a few Kbp rather than ~4Mbp
This still doesn't give a useful error message, because I didn't test this properly! ValueError has no message to look for:
Traceback (most recent call last):
File "/root/miniconda3/bin/poppunk", line 10, in <module>
sys.exit(main())
File "/root/miniconda3/lib/python3.7/site-packages/PopPUNK/__main__.py", line 399, in main
args.no_stream, args.mash, args.threads)
File "/root/miniconda3/lib/python3.7/site-packages/PopPUNK/mash.py", line 549, in queryDatabase
pool.map(partial(fitKmerBlock, distMat=distMat, raw = raw, klist=klist, jacobian=jacobian), mat_chunks)
File "/root/miniconda3/lib/python3.7/site-packages/sharedmem/sharedmem.py", line 761, in map
raise pg.get_exception()
sharedmem.sharedmem.SlaveException: 'ValueError' object has no attribute 'message'
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.7/site-packages/PopPUNK/mash.py", line 600, in fitKmerCurve
bounds=([-np.inf, -np.inf], [0, 0]))
File "/root/miniconda3/lib/python3.7/site-packages/scipy/optimize/_lsq/least_squares.py", line 804, in least_squares
raise ValueError("Residuals are not finite in the initial point.")
ValueError: Residuals are not finite in the initial point.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.7/site-packages/sharedmem/sharedmem.py", line 294, in _slaveMain
self.main(self, *self.args)
File "/root/miniconda3/lib/python3.7/site-packages/sharedmem/sharedmem.py", line 628, in _main
r = realfunc(work)
File "/root/miniconda3/lib/python3.7/site-packages/sharedmem/sharedmem.py", line 703, in realfunc
else: return func(i)
File "/root/miniconda3/lib/python3.7/site-packages/PopPUNK/mash.py", line 572, in fitKmerBlock
distMat[start:end, :] = np.apply_along_axis(fitKmerCurve, 1, raw[start:end, :], klist, jacobian)
File "/root/miniconda3/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 380, in apply_along_axis
res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))
File "/root/miniconda3/lib/python3.7/site-packages/PopPUNK/mash.py", line 605, in fitKmerCurve
np.array2string(pairwise, precision=4, separator=',',suppress_small=True) +
AttributeError: 'ValueError' object has no attribute 'message'
Fixed in 6e8f04e1f1d7bdc8974f17eb09a1ac72fb11efc7
Hi!
This might be a problem my end but I seem to be having a memory issue when --easy-run or --createdb get to the calculate core and accessory distances stage. I've successfully run --easy-run on ~70 cdiff genomes but increasing to much more (200+) gives an error (end of this message).
When I'm running --easy-run I'm using: poppunk --easy-run --r-files reference_list.txt --output lm_example --full-db --min-k 15
And I've tried this command for createdb: poppunk --create-db --r-files reference_list.txt --output poppunk_db --k-step 2 --min-k 20 --plot-fit 5 Also threaded and also both with --no-stream
I've also provided as much as 300Gb memory for the run
Thanks in advance for any help with this!
Calculating core and accessory distances Traceback (most recent call last): File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 415, in get return Q.get(timeout=1) File "/well/bag/moorem/anaconda/lib/python3.6/multiprocessing/queues.py", line 105, in get raise Empty queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 423, in get return Q.get(timeout=0) File "/well/bag/moorem/anaconda/lib/python3.6/multiprocessing/queues.py", line 105, in get raise Empty queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 757, in map capsule = pg.get(R) File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 425, in get raise StopProcessGroup sharedmem.sharedmem.StopProcessGroup: StopProcessGroup
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/well/bag/moorem/anaconda/bin/poppunk", line 11, in
sys.exit(main())
File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/PopPUNK/main.py", line 210, in main
args.plot_fit, args.no_stream, args.mash, args.threads)
File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/PopPUNK/mash.py", line 520, in queryDatabase
pool.map(partial(fitKmerBlock, distMat=distMat, raw = raw, klist=klist, jacobian=jacobian), mat_chunks)
File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 761, in map
raise pg.get_exception()
sharedmem.sharedmem.SlaveException: Residuals are not finite in the initial point.
Traceback (most recent call last):
File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 294, in _slaveMain
self.main(self, self.args)
File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 628, in _main
r = realfunc(work)
File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/sharedmem/sharedmem.py", line 703, in realfunc
else: return func(i)
File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/PopPUNK/mash.py", line 543, in fitKmerBlock
distMat[start:end, :] = np.apply_along_axis(fitKmerCurve, 1, raw[start:end, :], klist, jacobian)
File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 380, in apply_along_axis
buff[ind] = asanyarray(func1d(inarr_view[ind], args, **kwargs))
File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/PopPUNK/mash.py", line 570, in fitKmerCurve
bounds=([-np.inf, -np.inf], [0, 0]))
File "/well/bag/moorem/anaconda/lib/python3.6/site-packages/scipy/optimize/_lsq/least_squares.py", line 804, in least_squares
raise ValueError("Residuals are not finite in the initial point.")
ValueError: Residuals are not finite in the initial point.