cvnlab / GLMsingle

A toolbox for accurate single-trial estimates in fMRI time-series data
BSD 3-Clause "New" or "Revised" License
98 stars 42 forks source link

GLMsingle saving error #148

Open 27-apizzuti opened 1 week ago

27-apizzuti commented 1 week ago

Hi,

I was running GLMsingle for my fMRI data, which includes experimental conditions with different durations. Based on the advice from GLMsingle WIKI I coded each long event as a sequence of successive shorter events. However, I encountered the following error while saving.

*** FITTING TYPE-B MODEL (FITHRF) ***

chunks: 100%|████████████████████████████████████████████████████████████████████████████| 24/24 [42:02<00:00, 105.11s/it]

*** Saving results to /home/ale/WB-MotionQuartet/sub-06/GLMsingle-v02/TYPEB_FITHRF.npy. ***

Traceback (most recent call last):
  File "/home/ale/WB-MotionQuartet/02_run_GLMsingle_server_type2.py", line 155, in <module>
    results_glmsingle = glmsingle_obj.fit(
  File "/home/ale/miniconda3/envs/glmsingle/lib/python3.10/site-packages/glmsingle/glmsingle.py", line 1185, in fit
    np.save(file0, results_out)
  File "/home/ale/miniconda3/envs/glmsingle/lib/python3.10/site-packages/numpy/lib/npyio.py", line 546, in save
    format.write_array(fid, arr, allow_pickle=allow_pickle,
  File "/home/ale/miniconda3/envs/glmsingle/lib/python3.10/site-packages/numpy/lib/format.py", line 719, in write_array
    pickle.dump(array, fp, protocol=3, **pickle_kwargs)
OverflowError: serializing a bytes object larger than 4 GiB requires pickle protocol 4 or higher

Do you have any suggestions on how to resolve this issue? Thank you in advance for your help! Best, Alessandra

kendrickkay commented 1 week ago

We noted on the README that currently there is a 4GB limit for pickles... You could work around this by saving the outputs yourself in HDF5... But I wonder if there is a better fix (@jacob-prince @iancharest )?

Blink621 commented 1 week ago

Hi, @27-apizzuti I met the same issue before. In my case, the python version is 3.8.5 and the numpy version is 1.24.4. In this case, the default pickle protocol is 3 and it cann't support saving a bytes object larger than 4 GiB. So I guess one way to solve this problem is to upgrade your numpy if your python are supported. In my case the 1.24.4 numpy is the latest version under python3.8.5, so I don't test this. Here is also a related issue opened in numpy (https://github.com/numpy/numpy/issues/26224).

I solve this in another way by adding several lines on the top of the main glmsingle python script. It's under the site-packages of python (/xxxxx/python3.8/site-packages/glmsingle/glmsingle.py). These codes can replace the write_array function in numpy and now it can support pickle 4. Though it's not elegant, but it indeed solve my problem.

from numpy.lib.format import write_array
import pickle

# Backup the original write_array function
original_write_array = write_array

#Custom write_array function, modifying the protocol to 4
def custom_write_array(fp, array, allow_pickle=False, pickle_kwargs=None):
    if pickle_kwargs is None:
        pickle_kwargs = {}
    if array.dtype.hasobject:
        if not allow_pickle:
            raise ValueError("Object arrays cannot be saved when allow_pickle=False")
        # Modify the pickle protocol to 4 here
        pickle.dump(array, fp, protocol=4, **pickle_kwargs)
    else:
        original_write_array(fp, array, allow_pickle=allow_pickle, pickle_kwargs=pickle_kwargs)

### Replace the original write_array function
np.lib.format.write_array = custom_write_array
jacob-prince commented 1 week ago

When needing to save large python dictionaries ( > 4 GB), I typically follow @Blink621's solution and just use pickle.dump with protocol set to 4. This has worked pretty seamlessly in the past for me, and @iancharest we should work on incorporating this into glmsingle by default.

27-apizzuti commented 1 week ago

Thanks @Blink621 for the solution, it worked also for me!

jacob-prince commented 1 week ago

Thanks @Blink621 for the help, and glad to hear it worked for you @27-apizzuti.

@iancharest, should we incorporate the above code snippet directly into glmsingle.py? Or would some other solution be preferable?