LIMO-EEG-Toolbox / limo_tools

Hierarchical Linear Modelling for MEEG data
https://limo-eeg-toolbox.github.io/limo_meeg/
Other
58 stars 28 forks source link

parpool error during 2nd level [BUG] #155

Closed amisepa closed 1 year ago

amisepa commented 1 year ago

Latest Matlab, EEGLAB, and both master and HotFixes branch give the following error. 1st level completed successfully. Trying to do paired t-test at 2nd level, I get this error when computing bootstrap:

Error using parpool
Too many workers requested. The cluster "Processes" has the NumWorkers property set to a maximum of 8 workers but 15 workers were requested. Either request a number of workers less
than NumWorkers, or increase the value of the NumWorkers property for the cluster (up to a maximum of 512 for the Local cluster).

Error in limo_check_ppool (line 36)
            parpool(N-1);

Error in limo_random_robust (line 515)
            limo_check_ppool

Error in limo_random_select (line 817)
    tmpname = limo_random_robust(3,tmp_data1,tmp_data2,parameters,LIMO);

Error in limo_random_effect>Paired_t_test_Callback (line 219)
    limo_random_select('paired t-test',handles.chan_file,'nboot',handles.b,'tfce',handles.tfce,'type','Channels');

Error in gui_mainfcn (line 95)
        feval(varargin{:});

Error in limo_random_effect (line 29)
    gui_mainfcn(gui_State, varargin{:});

Error in matlab.graphics.internal.figfile.FigFile/read>@(hObject,eventdata)limo_random_effect('Paired_t_test_Callback',hObject,eventdata,guidata(hObject))

Error while evaluating UIControl Callback.

Turning off parpool in MAtlab preferences or in EEGLAB preferences does not solve the issue.

Thanks in advance

amisepa commented 1 year ago

FYI:

c = 

 Local Cluster

    Properties: 

                   Profile: Processes
                  Modified: false
                      Host: laptop-i01si850
                NumWorkers: 8
                NumThreads: 1

   RequiresOnlineLicensing: false

    Associated Jobs: 

            Number Pending: 0
             Number Queued: 0
            Number Running: 0
           Number Finished: 0

commenting limo_check_ppool fixes the error.

cmmrandau commented 1 year ago

You can comment the call to limo_check_ppool in limo_random_robust and it will call your parpool with default settings, but I'm sure Cyril has a better answer. :)

On Wed, 2023-05-17 at 20:24 -0700, Cedric Cannard wrote:

Latest Matlab, EEGLAB, and both master and HotFixes branch give the following error. 1st level completed successfully. Trying to do paired t-test at 2nd level, I get this error when computing bootstrap: Error using parpool Too many workers requested. The cluster "Processes" has the NumWorkers property set to a maximum of 8 workers but 15 workers were requested. Either request a number of workers less than NumWorkers, or increase the value of the NumWorkers property for the cluster (up to a maximum of 512 for the Local cluster).

Error in limo_check_ppool (line 36)             parpool(N-1);

Error in limo_random_robust (line 515)             limo_check_ppool

Error in limo_random_select (line 817)     tmpname = limo_random_robust(3,tmp_data1,tmp_data2,parameters,LIMO);

Error in limo_random_effect>Paired_t_test_Callback (line 219)     limo_random_select('paired t- test',handles.chan_file,'nboot',handles.b,'tfce',handles.tfce,'type', 'Channels');

Error in gui_mainfcn (line 95)         feval(varargin{:});

Error in limo_random_effect (line 29)     gui_mainfcn(gui_State, varargin{:});

Error in matlab.graphics.internal.figfile.FigFile/read>@(hObject,eventdata)lim o_random_effect('Paired_t_test_Callback',hObject,eventdata,guidata(hO bject))   Error while evaluating UIControl Callback. Turning off parpool in MAtlab preferences or in EEGLAB preferences does not solve the issue. Thanks in advance — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

amisepa commented 1 year ago

Thanks, yes that's what I'm doing currently but wanted to report as I'm guessing this is for providing faster computation and should take into account the right number of workers and threads automatically.

CPernet commented 1 year ago

something is wrong in assessing your currently running processes (1) I pushed a fix which is to closes the ppool after 1st level see this commit (2) but I'd like to have the true solution (unfortunately I cannot reproduce your error, it doesn't bug on my windows laptop or linux server)

can you try the following in the command window

limo_check_ppool % this starts the ppool the limo way
p = gcp('nocreate');

now p should not be empty? if not empty I don't understand why it enters the if statement in limo_check_ppool. If p is empty, it enters the if statement and launches the pool which leads to your error since it is already in use

what do you have for N? (your error message says 15)

N = getenv('NUMBER_OF_PROCESSORS'); % logical number of cores (i.e. count hyperthreading)
if isempty(N)
    N = feature('numcores');          % physical number of cores
elseif ischar(N)
    N = str2double(N);
end

I don't understand how calling the function in limo_batch doesn't fail but somehow fails calling it a second time .. N in the limo_check_ppool should be the same, 15 workers because N should be 16 (ie you have 8 processors with 2 threads)

amisepa commented 1 year ago

I think I had commented it at 1st level and forgot about it. NowI get the error with limo_batch too, so that makes more sense.

Found a solution.

To take into account the total number of processors (16 in my case), you have to use your variable 'c' like this at line 36: p = c.parpool(N-1); If I understand correctly, this increases the cluster profile's number of workers to 16, otherwise, it is limited to the physical cores without taking into account the threads (8 in my case).

https://www.mathworks.com/matlabcentral/answers/791324-matlab-only-uses-half-of-the-number-of-logical-cores-how-can-i-use-all-of-them

Why using N-1 and not N by the way?

CPernet commented 1 year ago

yes thx I know how this works, but you haven't answered my questions above about p or N, so writing a code that works for everybody while fixing your particular problem is not that easy ...

I'm assuming (1) p is empty and therefore it enters the if statement (2) N is correct (16 from getenv) and (3) it is calling parpool(N-1) that fails and your proposed solution is instead to do p = c.parpool(N-1) although i really don't see why setting up p at this point just doing c.parpool(N-1) starts the pool (changed here)

if all the above is true, it still doesn't solve the initial error you reported The cluster "Processes" has the NumWorkers property set to a maximum of 8 workers but 15 workers were requested that N-1 is too high.

CPernet commented 1 year ago

following up -- @amisepa ? have you figured how you got the error The cluster "Processes" has the NumWorkers property set to a maximum of 8 workers but 15 workers were requested if indeed you have 16 workers

amisepa commented 1 year ago

Yes, p was empty initially because limo_check_ppool was simply crashing because of the error.

I believe the problem was that you were opening with parpool(N-1), which sets by default the maximum number of workers in the cluster profile to the number of physical cores, excluding threads to avoid potential hyperthreading issues maybe?). So the parpool had a limit of 8 workers max by default. Now, the cluste profile is updated to 16 by using the variable c, which takes into account the total number of processors (cores + threads). Before it could only work for N = feature('numcores'); but not for N = getenv('NUMBER_OF_PROCESSORS');

Setting up p with p = c.parpool(N-1); is not necessary indeed. It's c that's the solution. I think this solution should account for most people.

You probably no longer need to shut down parpool at the end of LIMO 1st level now, since it takes longer to open twice, and should work fine now.

CPernet commented 1 year ago

doesn't make sense because if p is empty it goes to line https://github.com/LIMO-EEG-Toolbox/limo_tools/blob/HotFixes/limo_check_ppool.m#L36 and doesn't crash at it uses try - catch ; so was not empty to enter the loop causing the crash -- I guess c.parppol solves it

amisepa commented 1 year ago

You didn;t have the try catch at that point when I submitted the error, it was running like this: https://github.com/LIMO-EEG-Toolbox/limo_tools/blob/master/limo_check_ppool.m Again, it was crashing line 36 because p was empty and N was above max limit.

By the way, your new catch has an error for me:

image