cvnlab / GLMsingle

A toolbox for accurate single-trial estimates in fMRI time-series data
BSD 3-Clause "New" or "Revised" License
101 stars 44 forks source link

Memory error - python3 #106

Closed chkmp closed 5 months ago

chkmp commented 1 year ago

Hello - hope you're well and thank you for creating this tool.

I'm having an issue with running glmsingle in python3. This is the error I get on my terminal: MemoryError: Unable to allocate 8.52 TiB for an array with shape (1082035, 1082035) and data type float64

I read in your repository that numpy has a 4GB limit for the pickle files and followed your advice to use the wanthf5 flag opt['wanthdf5'] = 1. I've also tried turning off "disk saving" by setting opt['wantfileoutputs'] = [0,0,0,0]. Just to confirm, is this how you turn off "disk saving"?

I've tried using as input both the 4d and 2d version of my data, but I'm still getting the same error. Here's what my data from 1 fMRI run looks like: data_2d.shape = (1082035, 742), data_4d.shape = (97, 115, 97, 742), design.shape = (742,2).

Could you please advise on how to proceed?

Many thanks in advance.

Please find below the full version of my test code for 1 run:

import glmsingle        
from glmsingle.glmsingle import GLM_single                                                                                                                                                        
import matplotlib.pyplot as plt 
import numpy as np                                                                                                                                                               
import nibabel as nib
import os
from os.path import join, exists, split
import sys
import time
import urllib.request
import copy
import warnings
from tqdm import tqdm
from pprint import pprint
warnings.filterwarnings('ignore')
import glob

design = np.genfromtxt('/path/to/design/design.csv', delimiter=',', skip_header = 1) 
nifti_file_path = '/path/to/nifti/run_01.nii.gz'                                                                                                                                                   
nifti_image = nib.load(nifti_file_path) 

#Get the image data as a 4D array
data_4d_original = nifti_image.get_fdata() 

# Convert 4D array to 2D by reshaping
data = data_4d_original.reshape(-1, data_4d_original.shape[-1])

stimdur = 6
tr = 1

# create a directory for saving GLMsingle outputs
outputdir_glmsingle = '/path/to/output/glmsingle/'
opt = dict()

# keep them in memory, not in disk

opt['wantfileoutputs'] = [0,0,0,0]
opt['wantmemoryoutputs'] = [1,1,1,1]

#flag to avoid memory issue
opt['wanthdf5'] = 1

# running the procedure using the .fit() routine
glmsingle_obj = GLM_single(opt)

# visualize all the hyperparameters
pprint(glmsingle_obj.params)

results_glmsingle = glmsingle_obj.fit(design, data, stimdur, tr, outputdir=outputdir_glmsingle)

Returned output:

{'R2thresh': 0,
 'brainR2': [],
 'brainexclude': False,
 'brainthresh': [99.0, 0.1],
 'chunklen': 50000,
 'extra_regressors': False,
 'fracs': array([1.  , 0.95, 0.9 , 0.85, 0.8 , 0.75, 0.7 , 0.65, 0.6 , 0.55, 0.5 ,
       0.45, 0.4 , 0.35, 0.3 , 0.25, 0.2 , 0.15, 0.1 , 0.05]),
 'hrffitmask': 1,
 'hrfmodel': 'optimise',
 'hrfthresh': 0.5,
 'lambda': 0,
 'n_boots': 100,
 'n_jobs': 1,
 'n_pcs': 10,
 'numforhrf': 50,
 'pcR2cutoff': [],
 'pcR2cutoffmask': 1,
 'pcstop': 1.05,
 'seed': 1692378162.9356127,
 'suppressoutput': 0,
 'wantautoscale': 1,
 'wantfileoutputs': [0, 0, 0, 0],
 'wantfracridge': 1,
 'wantglmdenoise': 1,
 'wanthdf5': 1,
 'wantlibrary': 1,
 'wantlss': 0,
 'wantmemoryoutputs': [1, 1, 1, 1],
 'wantparametric': 0,
 'wantpercentbold': 1}
*** FITTING TYPE-A MODEL (ONOFF) ***

fitting model...
done.

preparing output...
done.

computing model fits...
done.

computing R^2...
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-45-0eb692306f31> in <module>
     27 pprint(glmsingle_obj.params)
     28 
---> 29 results_glmsingle = glmsingle_obj.fit(design, data, stimdur, tr, outputdir=outputdir_glmsingle)

~/anaconda3/lib/python3.8/site-packages/glmsingle/glmsingle.py in fit(self, design, data, stimdur, tr, outputdir, figuredir)
    606             'suppressoutput': 0
    607         }
--> 608         results0 = glm_estimatemodel(
    609             design0,
    610             self.data,

~/anaconda3/lib/python3.8/site-packages/glmsingle/ols/glm_estimatemodel.py in glm_estimatemodel(design, data, stimdur, tr, hrfmodel, hrfknobs, resampling, opt, cache, mode)
    782 
    783         # calculate overall R^2 [beware: MEMORY] XYZ x time
--> 784         results['R2'] = calc_cod_stack(modelfit, dataw)
    785 
    786         # [XYZ by time] by n_runs

~/anaconda3/lib/python3.8/site-packages/glmsingle/cod/calc_cod.py in calc_cod_stack(yhat, y)
    114     # calculate global R2
    115 
--> 116     r2s = 100*(1 - zerodiv(
    117         numer,
    118         denom,

~/anaconda3/lib/python3.8/site-packages/glmsingle/utils/zerodiv.py in zerodiv(x, y, val, wantcaution)
     52             tmp = y
     53             tmp[bad] = 1
---> 54             f = x / tmp[:, np.newaxis]
     55             f[bad] = val
     56         return f

MemoryError: Unable to allocate 8.52 TiB for an array with shape (1082035, 1082035) and data type float64

Shape of my data:

In [46]: data.shape                                                                                                                                                                      
Out[46]: (1082035, 742)

In [47]: design.shape                                                                                                                                                                    
Out[47]: (742, 2)

In [48]: data_4d_original.shape                                                                                                                                                          
Out[48]: (97, 115, 97, 742)
iancharest commented 1 year ago

Hey I have this on my radar to fix. I won't be able before Wednesday however.

On Mon, Aug 21, 2023, 12:47 PM Kendrick Kay @.***> wrote:

Assigned #106 https://github.com/cvnlab/GLMsingle/issues/106 to @iancharest https://github.com/iancharest.

— Reply to this email directly, view it on GitHub https://github.com/cvnlab/GLMsingle/issues/106#event-10144458932, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEXT7EMDKF7YGOFCTQAD5DXWNDGZANCNFSM6AAAAAA3YHY3VE . You are receiving this because you were assigned.Message ID: @.***>

kendrickkay commented 1 year ago

Sorry for the delay. This should now be fixed with #107 . Please let us know if you encounter any other bugs.

kendrickkay commented 1 year ago

Also, thank you for the very informative diagnostic information with your original post.

chkmp commented 1 year ago

Hello again - thank you so much for the quick reply, really appreciate it! Unfortunately, I'm still experiencing an issue (both with the 4d and 2d version of my data).

Please see the error message below.

Thank you very much in advance for your time.

MemoryError                               Traceback (most recent call last)
<ipython-input-7-f24d48447ae1> in <module>
     24 pprint(glmsingle_obj.params)
     25 
---> 26 results_glmsingle = glmsingle_obj.fit(design, data, stimdur, tr, outputdir=outputdir_glmsingle)

~/anaconda3/lib/python3.8/site-packages/glmsingle/glmsingle.py in fit(self, design, data, stimdur, tr, outputdir, figuredir)
    606             'suppressoutput': 0
    607         }
--> 608         results0 = glm_estimatemodel(
    609             design0,
    610             self.data,

~/anaconda3/lib/python3.8/site-packages/glmsingle/ols/glm_estimatemodel.py in glm_estimatemodel(design, data, stimdur, tr, hrfmodel, hrfknobs, resampling, opt, cache, mode)
    779 
    780         # remove polynomials from the model fits (or predictions)
--> 781         modelfit = [polymatrix[x] @ modelfit[x] for x in range(numruns)]
    782 
    783         # calculate overall R^2 [beware: MEMORY] XYZ x time

~/anaconda3/lib/python3.8/site-packages/glmsingle/ols/glm_estimatemodel.py in <listcomp>(.0)
    779 
    780         # remove polynomials from the model fits (or predictions)
--> 781         modelfit = [polymatrix[x] @ modelfit[x] for x in range(numruns)]
    782 
    783         # calculate overall R^2 [beware: MEMORY] XYZ x time

MemoryError: Unable to allocate 5.98 GiB for an array with shape (742, 1082035) and data type float64

Just to remind you - this is how my data look like!

In [8]: data.shape                                                                                                                                                                       
Out[8]: (1082035, 742)

In [9]: design.shape                                                                                                                                                                     
Out[9]: (742, 2)
kendrickkay commented 1 year ago

Hi - it appears that you may perhaps simply not have enough RAM. To check, perhaps you could try running the code on a subset of the voxels? Like instead of 1082035, just try the first 10000? If it is a RAM issue, easiest solution is to run on a computer with more RAM!

chkmp commented 1 year ago

Hi! I was running the code in ipython3 and I think that's why I got that last memory error. It didn't make sense to me as the machine I'm running it from has enough RAM. Apologies for this!

(base) bash-4.2$ free
              total        used        free      shared  buff/cache   available
Mem:       65792424    48022572    12322520      141772     5447332    17136116
Swap:      24772604    24484824      287780

I switched to running it properly from the command line, and now I get this error consistently. I am posting it just in case it's an easy fix.

Many thanks for your help with all this.

*** FITTING TYPE-A MODEL (ONOFF) ***

fitting model...
done.

preparing output...
done.

computing model fits...
done.

computing R^2...
done.

computing SNR...
done.

*** Setting brain R2 threshold to 3.033874205311465 ***

*** FITTING TYPE-B MODEL (FITHRF) ***

chunks: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 22/22 [26:30<00:00, 72.30s/it]
*** DETERMINING GLMDENOISE REGRESSORS ***

*** CROSS-VALIDATING DIFFERENT NUMBERS OF REGRESSORS ***

chunks:   0%|                                                                                                                                                     | 0/22 [00:41<?, ?it/s]
Traceback (most recent call last):
  File "singletrialglm.py", line 66, in <module>
    results_glmsingle = glmsingle_obj.fit(design, data, stimdur, tr, outputdir=outputdir_glmsingle)
  File "/its/home/ck422/anaconda3/lib/python3.8/site-packages/glmsingle/glmsingle.py", line 1100, in fit
    glmbadness[relix, :] = calcbadness(
  File "/its/home/ck422/anaconda3/lib/python3.8/site-packages/glmsingle/ssq/calcbadness.py", line 115, in calcbadness
    traincols = np.concatenate([validcolumns[x] for x in trainix])
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate
kendrickkay commented 1 year ago

Hi - can you confirm that you are using the latest version of the github repo? Also, can you tell me what your design matrix is like? (E.g., how many conditions, do you have repeats across runs, etc.?)

chkmp commented 1 year ago

Hello - yes I can confirm that I'm using the latest version (including the latest commit #107 for zerodiv.py and commit #108 for check_inputs.py and gethrf.py).

My design matrix has two columns for my two conditions : "face" and "scene" and 742 rows, that represent the time points (TR = 1s). I set my stimdur to 6, and there are a total of 60 blocks (30 for each conditions) and each block lasts 6 sec.

I am trying to run 1 subject with 1 run, loading 1 nifti and 1 design.

kendrickkay commented 1 year ago

I see. That is helpful. In order to use the GLMdenoise and RR components of GLMsingle, you need to pass in at least 2 runs (with repeats across runs). Perhaps try that?

chkmp commented 1 year ago

okay that is really helpful indeed. Sorry I missed that! Thank you so much for the help. I'll try that later today!

iancharest commented 5 months ago

we will close this issue now, feel free to reopen if you encounter more memory issues.