Open 27-apizzuti opened 1 week ago
We noted on the README that currently there is a 4GB limit for pickles... You could work around this by saving the outputs yourself in HDF5... But I wonder if there is a better fix (@jacob-prince @iancharest )?
Hi, @27-apizzuti I met the same issue before. In my case, the python version is 3.8.5 and the numpy version is 1.24.4. In this case, the default pickle protocol is 3 and it cann't support saving a bytes object larger than 4 GiB. So I guess one way to solve this problem is to upgrade your numpy if your python are supported. In my case the 1.24.4 numpy is the latest version under python3.8.5, so I don't test this. Here is also a related issue opened in numpy (https://github.com/numpy/numpy/issues/26224).
I solve this in another way by adding several lines on the top of the main glmsingle python script. It's under the site-packages of python (/xxxxx/python3.8/site-packages/glmsingle/glmsingle.py). These codes can replace the write_array function in numpy and now it can support pickle 4. Though it's not elegant, but it indeed solve my problem.
from numpy.lib.format import write_array
import pickle
# Backup the original write_array function
original_write_array = write_array
#Custom write_array function, modifying the protocol to 4
def custom_write_array(fp, array, allow_pickle=False, pickle_kwargs=None):
if pickle_kwargs is None:
pickle_kwargs = {}
if array.dtype.hasobject:
if not allow_pickle:
raise ValueError("Object arrays cannot be saved when allow_pickle=False")
# Modify the pickle protocol to 4 here
pickle.dump(array, fp, protocol=4, **pickle_kwargs)
else:
original_write_array(fp, array, allow_pickle=allow_pickle, pickle_kwargs=pickle_kwargs)
### Replace the original write_array function
np.lib.format.write_array = custom_write_array
When needing to save large python dictionaries ( > 4 GB), I typically follow @Blink621's solution and just use pickle.dump
with protocol set to 4. This has worked pretty seamlessly in the past for me, and @iancharest we should work on incorporating this into glmsingle by default.
Thanks @Blink621 for the solution, it worked also for me!
Thanks @Blink621 for the help, and glad to hear it worked for you @27-apizzuti.
@iancharest, should we incorporate the above code snippet directly into glmsingle.py? Or would some other solution be preferable?
Hi,
I was running GLMsingle for my fMRI data, which includes experimental conditions with different durations. Based on the advice from GLMsingle WIKI I coded each long event as a sequence of successive shorter events. However, I encountered the following error while saving.
Do you have any suggestions on how to resolve this issue? Thank you in advance for your help! Best, Alessandra