Closed CallowBrainProject closed 2 years ago
Hi Daniel,
we recently introduced the possibility to parallelize the fit, and it looks like the joblib
library we use does not work on your system. Can you try changing the backend used by joblib
(either loky
or threading
) using the following code before the fit?
ae.set_config('parallel_backend', 'loky')
ae.set_config('parallel_jobs', -1)
ae.fit()
and see if something changes?
Also, did you try this suggestion? https://github.com/cosanlab/nltools/issues/281#issuecomment-607766951
Your initial suggestion seems to have done the trick, thank you very much for the suggested edits.
Good to hear that! Which is the option that works for you? Loky or threading? And do you still get an improvement in fitting time with this parallel code?
It seems like either or works for the fix. Not sure if I am still getting the improvement though as I haven't run AMICO on this system previously.
To test if you have any improvements, try repeating the fit first with:
ae.set_config('parallel_jobs', 1)
ae.fit()
and then with:
ae.set_config('parallel_jobs', -1)
ae.fit()
If the joblib
library works on your system, you should see a speedup in the second case. Thanks for testing!
Hi @CallowBrainProject ,
we are trying to debug the problem that you experienced, as it seems that some other users have experienced it as well. However, on all out machines (Linux and OSX, with different OS versions as well as python systems) we cannot reproduce it. Can you please let us know what your system is (OS, version, python system and version...)? Thanks!
Hello,
System Software Overview:
System Version: macOS 12.3.1 (21E258) Kernel Version: Darwin 21.4.0 Boot Volume: Macintosh HD Boot Mode: Normal Computer Name: kneswk245036 User Name: Amos (amos) Secure Virtual Memory: Enabled System Integrity Protection: Disabled Time since boot: 1 day 13:08
Python version
[kneswk245036:/Volumes/MRI_BACKUP/test] amos% python Python 3.7.1 (default, Dec 14 2018, 13:28:58) [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Conda version
[kneswk245036:/Volumes/MRI_BACKUP/test] amos% conda --version conda 4.12.0
Hope this helps!
Thanks for sharing the info, but unfortunately we cannot reproduce the error, yet
I may have found a solution to fix the problem (I couldn't reproduce it, but I think I figured it out the probable cause). Could you give a shot to the new version, i.e. 1.4.3
? Also, parallel fitting should now work out of the box. In case of problems, you could roll back and avoid using parallel fitting by issuing:
ae.set_config('parallel_jobs', 1)
ae.fit()
Do you see any improvement in the computation time with this last version? I suspect in your system the parallel fit never actually worked, indeed.
PS: I also removed the option parallel_backend
as it turned out that the current implementation of the parallel fit should only work with the loky
backend.
Hello, I am running AMICO and trying to fit the NODDI model to my diffusion data. I am able to work through the following commands but run into an error when finally getting to the ae.fit() section.
import amico import spams
amico.core.setup()
ae = amico.Evaluation("/Volumes/MRI_BACKUP/Riggins_data/MST028/", "Diffusion") amico.util.fsl2scheme("/Volumes/MRI_BACKUP/Riggins_data/MST028/Diffusion/bvals","/Volumes/MRI_BACKUP/Riggins_data/MST028/Diffusion/bvecs",bStep=(0,1500,3000)) ae.load_data(dwi_filename = "/Volumes/MRI_BACKUP/Riggins_data/MST028/Diffusion/dwi_corrected.nii.gz", scheme_filename = "bvals.scheme", mask_filename ="dwi_mask_3D.nii.gz", b0_thr = 5) ae.set_model("NODDI") ae.generate_kernels() ae.load_kernels() ae.fit() ae.save_results()
-> Precomputing rotation matrices: |██████████████████████████████████████████████████████████| 100.0% [ DONE ] -> Setting b-values to the closest shell in [ 0. 1500. 3000.] -> Writing scheme file to [ /Volumes/MRI_BACKUP/Riggins_data/MST028/Diffusion/bvals.scheme ]
-> Loading data:
-> Preprocessing:
-> Creating LUT for "NODDI" model: |██████████████████████████████████████████████████████████| 100.0% [ 234.7 seconds ]
-> Resampling LUT for subject "Diffusion": |██████████████████████████████████████████████████████████| 100.0% [ 66.2 seconds ]
-> Fitting "NODDI" model to 249128 voxels: | | 0.0%joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/queues.py", line 153, in feed obj = dumps(obj, reducers=reducers) File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py", line 271, in dumps dump(obj, buf, reducers=reducers, protocol=protocol) File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py", line 264, in dump _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj) File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 602, in dump return Pickler.dump(self, obj) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 437, in dump self.save(obj) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 882, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 887, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 638, in save_reduce save(args) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 786, in save_tuple save(element) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 816, in save_list self._batch_appends(obj) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends save(tmp[0]) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 771, in save_tuple save(element) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 786, in save_tuple save(element) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 882, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 882, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 882, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, *rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 882, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 524, in save rv = reduce(self.proto) TypeError: cannot serialize 'IndexedGzipFile' object """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "", line 11, in
File "/Users/amos/anaconda3/lib/python3.7/site-packages/amico/core.py", line 463, in fit
for i in tqdm(range(totVoxels), ncols=70, bar_format=' |{bar}| {percentage:4.1f}%', disable=(get_verbose()<3))
File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 1056, in call
self.retrieve()
File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 935, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "/Users/amos/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 432, in result
return self.get_result()
File "/Users/amos/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in get_result
raise self._exception
_pickle.PicklingError: Could not pickle the task to send it to the workers.
| | 0.1%
Any ideas on what might be leading to the thrown error?
Thank you, Daniel