JinghaoLu / MIN1PIPE

A MINiscope 1-photon-based Calcium Imaging Signal Extraction PIPEline.
GNU General Public License v3.0
56 stars 25 forks source link

Parfor error: "Unexpected failure to indicate all intervals added." #42

Open ckemere opened 3 years ago

ckemere commented 3 years ago

Has anyone ever seen the following error? It's not particularly descriptive but I think is internal to Matlab somehow?

Analyzing and transferring files to the workers ...done.
Error using distcomp.remoteparfor/rebuildParforController (line 217)
Unexpected failure to indicate all intervals added.

Error in distcomp.remoteparfor/handleIntervalErrorResult (line 253)
                obj.rebuildParforController();

Error in distcomp.remoteparfor/getCompleteIntervals (line 387)
                            [r, err] = obj.handleIntervalErrorResult(r);

Error in dirt_clean (line 29)
        parfor i = 1: nframes

Error in neural_enhance (line 45)
            Ydcln = dirt_clean(tmp, szad, isparaad);

Error in min1pipe (line 87)
            [m, imaxy, overwrite_flag] = neural_enhance(m, filename_reg, Params);
JinghaoLu commented 3 years ago

Yes it is matlab internal issue. My guess is there might be some other functions in your path that is conflicting with the current execution? You can set a breakpoint at line 29 of dirt_clean, and then run the loop with for instead of parfor and see if there is an issue pops up.

ximion commented 3 years ago

I get the same error but at a different location with MATLAB 2019b on our HPC cluster:

Done loop #5/5
Warning: A worker aborted during execution of the parfor loop. The parfor loop
will now run again on the remaining workers. 
> In distcomp/remoteparfor/handleIntervalErrorResult (line 240)
  In distcomp/remoteparfor/getCompleteIntervals (line 387)
  In parallel_function>distributed_execution (line 745)
  In parallel_function (line 577)
  In inter_section (line 181)
  In frame_reg (line 107)
  In min1pipe_bwHPC (line 137) 
Error using distcomp.remoteparfor/rebuildParforController (line 217)
Unexpected failure to indicate all intervals added.

Error in distcomp.remoteparfor/handleIntervalErrorResult (line 253)
                obj.rebuildParforController();

Error in distcomp.remoteparfor/getCompleteIntervals (line 387)
                            [r, err] = obj.handleIntervalErrorResult(r);

Error in inter_section (line 181)
        parfor ip = 1: nY

Error in frame_reg (line 107)
        m = inter_section(m, sttn, se, pixs, scl, sigma_x, sigma_f, sigma_d,
        maskc);

Error in min1pipe_bwHPC (line 137)
                [m, corr_score, raw_score, scl, imaxy] = frame_reg(m, imaxy,
                se, Fsi_new, pixs, scl, sigma_x, sigma_f, sigma_d);

Not sure what is going on here, the parfor loop looks okay for me and Matlab is incredibly unhelpful with this particular error. The workspace for Matlab is created from scratch, so there is no chance that something else pollutes the path (just to be sure, I cleaned up everything a bunch of times).

Adith033 commented 1 year ago

{Error using distcomp.remoteparfor/rebuildParforController All workers aborted during execution of the parfor loop.

Error in distcomp.remoteparfor/handleIntervalErrorResult (line 259) obj.rebuildParforController();

Error in distcomp.remoteparfor/getCompleteIntervals (line 396) [r, err] = obj.handleIntervalErrorResult(r);

Error in MeTD_FinalinvALL (line 228) parfor i=16:pt } {The client lost connection to worker 1. This might be due to network problems, or the interactive communicating job might have errored. }

I also get the same error. Tried a lot of ways to solve it. @ckemere @ximion have you guys solved the issue. please reply