joblib error - Githubissues

jessakay commented 4 years ago

Everything seems to work nicely with typical sized Hi-C datasets but when attempting to run on something larger (e.g., ~4e9 contacts genome-wide) with -eps 5000,10000 -minPts 50,100 -hic, the following sort of issue pops up:

Clustering chr8 and chr8 finished. Estimated 43365022 self-ligation reads and 5506751 inter-ligation reads
Traceback (most recent call last):
  File "/local/anaconda3/envs/cloops/bin/cLoops", line 8, in <module>
    sys.exit(main())
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 352, in main
    hic, op.washU, op.juice, op.cut, op.plot, op.max_cut)
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 250, in pipe
    dataI_2, dataS_2, dis_2, dss_2 = runDBSCAN(cfs, ep, m, cut, cpu)
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 118, in runDBSCAN
    for f in fs)
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/joblib/parallel.py", line 789, in __call__
    self.retrieve()
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/joblib/parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/local/anaconda3/envs/cloops/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[(('chr8', 'chr8'), 'hic/chr8-chr8.jd', ...

Based on the https://github.com/scikit-learn/scikit-learn/issues/8920, I wrapped all the Parallel() in pipe.py inside with-blocks using the "threading" back-end and seems to have gotten around the error.

My question is whether this is the right way to go about this problem given the "parallel computating bugs" mentioned in the README.

YaqiangCao commented 4 years ago

Dear User, I am glad to know cLoops can work nicely with typical sized Hi-C datasets for you. Sorry for the potential problem. Here are some points I want to mention.

cLoops implemented a variant of DBSCAN, so maybe the issue from scikit-learn you mentioned maybe not the solution to this problem.
For the parallel computing bugs due to joblib, I found it will be fix with specific joblib version, joblib==0.11 works very well for all data and linux I used. So maybe there is no need to wrap all the 'Parallel()' in pipe.py .
according to error logger, the clustering process is finished. May I ask how many CPU used and the memory of your machine? If there is some out-of-memory issues for other chromosomes, it will also cause the parallel problem.
Hope the suggestions can fix this. Please let me know the feedback. Thank you. Best, Yaqiang

Everything seems to work nicely with typical sized Hi-C datasets but when attempting to run on something larger (e.g., ~4e9 contacts genome-wide) with -eps 5000,10000 -minPts 50,100 -hic, the following sort of issue pops up:
Clustering chr8 and chr8 finished. Estimated 43365022 self-ligation reads and 5506751 inter-ligation reads
Traceback (most recent call last):
  File "/local/anaconda3/envs/cloops/bin/cLoops", line 8, in <module>
    sys.exit(main())
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 352, in main
    hic, op.washU, op.juice, op.cut, op.plot, op.max_cut)
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 250, in pipe
    dataI_2, dataS_2, dis_2, dss_2 = runDBSCAN(cfs, ep, m, cut, cpu)
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/cLoops/pipe.py", line 118, in runDBSCAN
    for f in fs)
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/joblib/parallel.py", line 789, in __call__
    self.retrieve()
  File "/local/anaconda3/envs/cloops/lib/python2.7/site-packages/joblib/parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/local/anaconda3/envs/cloops/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[(('chr8', 'chr8'), 'hic/chr8-chr8.jd', ...
Based on the scikit-learn/scikit-learn#8920, I wrapped all the Parallel() in pipe.py inside with-blocks using the "threading" back-end and seems to have gotten around the error.

My question is whether this is the right way to go about this problem given the "parallel computating bugs" mentioned in the README.

jessakay commented 4 years ago

Thank you for the suggestions. I went back to check and indeed there was an issue with memory usage: without changing joblib's back-end resulted in using >700GB of memory (far in excess of the system limit), but only 125GB after the change.

I've been processing each chromosome individually (i.e., splitting the genome-wide bedpe by chromosome), but this shouldn't affect the results, right?

YaqiangCao commented 4 years ago

Processing each chromosome individually will not affect the results, only the estimation of self-ligation and inter-ligation cutoffs will be different, if you set the distance cutoff always as 0, then it will all the same. If there is still memory issue, maybe -cut 10000 can be used to remove some close PETs for calling. As you mentioned changing joblib's back-end will reduce a lot the memory, could you please show me several lines of example code ? Maybe I can implement your solutions. Thank you. Best, Yaqiang

On Tue, Feb 11, 2020 at 7:08 AM jessakay notifications@github.com wrote:

Thank you for the suggestions. I went back to check and indeed there was an issue with memory usage: without changing joblib's back-end resulted in using >700GB of memory (far in excess of the system limit), but only 125GB after the change.

I've been processing each chromosome individually (i.e., splitting the genome-wide bedpe by chromosome), but this shouldn't affect the results, right?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/YaqiangCao/cLoops/issues/18?email_source=notifications&email_token=AAOPQKNM7HZNKPQJP5LEHXTRCKILLA5CNFSM4KSHT5XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELMF2GI#issuecomment-584604953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOPQKKB3OHZOTAO3PDO6P3RCKILLANCNFSM4KSHT5XA .

jessakay commented 4 years ago

It involved just changing Parallel(n_jobs=cpu) to Parallel(n_jobs=cpu, backend='threading'), but the runtime seems to be a bit longer. Though I haven't done any extensive testing.

YaqiangCao commented 4 years ago

I will try it. Thank you. Yaqiang

On Wed, Feb 12, 2020 at 10:24 AM jessakay notifications@github.com wrote:

It involved just changing Parallel(n_jobs=cpu) to Parallel(n_jobs=cpu, backend='threading')

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/YaqiangCao/cLoops/issues/18?email_source=notifications&email_token=AAOPQKPYLXVUITYUOFMY3ZDRCQIC3A5CNFSM4KSHT5XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELRFKVQ#issuecomment-585258326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOPQKLNRKX2EHCIPVRQTJDRCQIC3ANCNFSM4KSHT5XA .

YaqiangCao / cLoops

joblib error #18