daducci / AMICO

Accelerated Microstructure Imaging via Convex Optimization (AMICO) from diffusion MRI data
Other
100 stars 62 forks source link

Issue with model fit command #136

Closed CallowBrainProject closed 2 years ago

CallowBrainProject commented 2 years ago

Hello, I am running AMICO and trying to fit the NODDI model to my diffusion data. I am able to work through the following commands but run into an error when finally getting to the ae.fit() section.

import amico import spams

amico.core.setup()

ae = amico.Evaluation("/Volumes/MRI_BACKUP/Riggins_data/MST028/", "Diffusion") amico.util.fsl2scheme("/Volumes/MRI_BACKUP/Riggins_data/MST028/Diffusion/bvals","/Volumes/MRI_BACKUP/Riggins_data/MST028/Diffusion/bvecs",bStep=(0,1500,3000)) ae.load_data(dwi_filename = "/Volumes/MRI_BACKUP/Riggins_data/MST028/Diffusion/dwi_corrected.nii.gz", scheme_filename = "bvals.scheme", mask_filename ="dwi_mask_3D.nii.gz", b0_thr = 5) ae.set_model("NODDI") ae.generate_kernels() ae.load_kernels() ae.fit() ae.save_results()

-> Precomputing rotation matrices: |██████████████████████████████████████████████████████████| 100.0% [ DONE ] -> Setting b-values to the closest shell in [ 0. 1500. 3000.] -> Writing scheme file to [ /Volumes/MRI_BACKUP/Riggins_data/MST028/Diffusion/bvals.scheme ]

-> Loading data:

-> Preprocessing:

-> Creating LUT for "NODDI" model: |██████████████████████████████████████████████████████████| 100.0% [ 234.7 seconds ]

-> Resampling LUT for subject "Diffusion": |██████████████████████████████████████████████████████████| 100.0% [ 66.2 seconds ]

-> Fitting "NODDI" model to 249128 voxels: | | 0.0%joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/queues.py", line 153, in feed obj = dumps(obj, reducers=reducers) File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py", line 271, in dumps dump(obj, buf, reducers=reducers, protocol=protocol) File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/backend/reduction.py", line 264, in dump _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj) File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 602, in dump return Pickler.dump(self, obj) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 437, in dump self.save(obj) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 882, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 887, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 638, in save_reduce save(args) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 786, in save_tuple save(element) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 816, in save_list self._batch_appends(obj) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends save(tmp[0]) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 771, in save_tuple save(element) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 786, in save_tuple save(element) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 882, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 882, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 882, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 549, in save self.save_reduce(obj=obj, *rv) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce save(state) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 504, in save f(self, obj) # Call unbound method with explicit self File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 856, in save_dict self._batch_setitems(obj.items()) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 882, in _batch_setitems save(v) File "/Users/amos/anaconda3/lib/python3.7/pickle.py", line 524, in save rv = reduce(self.proto) TypeError: cannot serialize 'IndexedGzipFile' object """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "", line 11, in File "/Users/amos/anaconda3/lib/python3.7/site-packages/amico/core.py", line 463, in fit for i in tqdm(range(totVoxels), ncols=70, bar_format=' |{bar}| {percentage:4.1f}%', disable=(get_verbose()<3)) File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 1056, in call self.retrieve() File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 935, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/Users/amos/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result return future.result(timeout=timeout) File "/Users/amos/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 432, in result return self.get_result() File "/Users/amos/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in get_result raise self._exception _pickle.PicklingError: Could not pickle the task to send it to the workers. | | 0.1%

Any ideas on what might be leading to the thrown error?

Thank you, Daniel

daducci commented 2 years ago

Hi Daniel,

we recently introduced the possibility to parallelize the fit, and it looks like the joblib library we use does not work on your system. Can you try changing the backend used by joblib (either loky or threading) using the following code before the fit?

ae.set_config('parallel_backend', 'loky')
ae.set_config('parallel_jobs', -1)
ae.fit()

and see if something changes?

daducci commented 2 years ago

Also, did you try this suggestion? https://github.com/cosanlab/nltools/issues/281#issuecomment-607766951

CallowBrainProject commented 2 years ago

Your initial suggestion seems to have done the trick, thank you very much for the suggested edits.

daducci commented 2 years ago

Good to hear that! Which is the option that works for you? Loky or threading? And do you still get an improvement in fitting time with this parallel code?

CallowBrainProject commented 2 years ago

It seems like either or works for the fix. Not sure if I am still getting the improvement though as I haven't run AMICO on this system previously.

daducci commented 2 years ago

To test if you have any improvements, try repeating the fit first with:

ae.set_config('parallel_jobs', 1)
ae.fit()

and then with:

ae.set_config('parallel_jobs', -1)
ae.fit()

If the joblib library works on your system, you should see a speedup in the second case. Thanks for testing!

daducci commented 2 years ago

Hi @CallowBrainProject ,

we are trying to debug the problem that you experienced, as it seems that some other users have experienced it as well. However, on all out machines (Linux and OSX, with different OS versions as well as python systems) we cannot reproduce it. Can you please let us know what your system is (OS, version, python system and version...)? Thanks!

CallowBrainProject commented 2 years ago

Hello,

System Software Overview:

System Version: macOS 12.3.1 (21E258) Kernel Version: Darwin 21.4.0 Boot Volume: Macintosh HD Boot Mode: Normal Computer Name: kneswk245036 User Name: Amos (amos) Secure Virtual Memory: Enabled System Integrity Protection: Disabled Time since boot: 1 day 13:08

Python version

[kneswk245036:/Volumes/MRI_BACKUP/test] amos% python Python 3.7.1 (default, Dec 14 2018, 13:28:58) [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin

Conda version

[kneswk245036:/Volumes/MRI_BACKUP/test] amos% conda --version conda 4.12.0

Hope this helps!

daducci commented 2 years ago

Thanks for sharing the info, but unfortunately we cannot reproduce the error, yet

daducci commented 2 years ago

I may have found a solution to fix the problem (I couldn't reproduce it, but I think I figured it out the probable cause). Could you give a shot to the new version, i.e. 1.4.3? Also, parallel fitting should now work out of the box. In case of problems, you could roll back and avoid using parallel fitting by issuing:

ae.set_config('parallel_jobs', 1)
ae.fit()

Do you see any improvement in the computation time with this last version? I suspect in your system the parallel fit never actually worked, indeed.

PS: I also removed the option parallel_backend as it turned out that the current implementation of the parallel fit should only work with the loky backend.