SIESTA-eu / wp15

work package 15, use case 2
0 stars 0 forks source link

use case 2.4 - automated EEG processing using MATLAB #5

Open robertoostenveld opened 2 months ago

robertoostenveld commented 2 months ago

ERPcore

The data set includes a small tabular entry characterising human participants and EEG data.

The results to be produced are of the vector and matrices reflecting the different type of types of analyses, based on user requirements.

The brain data analyses are fully automated, but different outputs are produced based on user requirements—the testing principle of minimisation applied to medical data analysis.

Relies on matlab/EEGLAB - the dataset includes 40 subjects (24,1GB)

CPernet commented 2 months ago

@marcelzwiers could you check if this runs for you? I still need to work on the output folder description, but at least I'd know it works. Download, install, run the command as described --> fail or pass execution wise -- thx

marcelzwiers commented 2 months ago

I think something went wrong, as there are now 2.3 files under the 2.4 tree?

Op vr 19 apr 2024 om 17:37 schreef Cyril Pernet @.***>:

@marcelzwiers https://github.com/marcelzwiers could you check if this runs for you? I still need to work on the output folder description, but at least I'd know it works. Download, install, run the command as described --> fail or pass execution wise -- thx

— Reply to this email directly, view it on GitHub https://github.com/SIESTA-eu/wp15/issues/5#issuecomment-2066826559, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTUGL5TJ4476CMRY624WVTY6E24VAVCNFSM6AAAAABGDZRLDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRWHAZDMNJVHE . You are receiving this because you were mentioned.Message ID: @.***>

robertoostenveld commented 2 months ago

Hi Marcel, I think it is fine if you fix these issues directly on the main branch by renaming the subdirectory and the script. Personally I would not put the script in a subdirectory to start with, but would rather avoid nested directories.

CPernet commented 2 months ago

same dataset, 2 sub- use cases .. anyway fixed the 'name'

marcelzwiers commented 2 months ago

I'm downloading the data (which takes forever) and will run the analysis. What is missing right now (@CPernet ) is the Output data section, i.e. some clear statistical endpoints that we can use to assess whether the pseudo data is serving its purpose or not

robertoostenveld commented 2 months ago

For the downloading from OSF the osfclient might be interesting and/or convenient. I don't think it will be faster, but it may make it more robust (possibly with a warm restart if it fails halfway).

Note that the annoyance of the download is also a relevant aspect of the whole process, so our ICT-oriented SIESTA team members should be made aware of that. Should we document the data size and download time in the README files?

CPernet commented 2 months ago

@marcelzwiers Yes, I have been thinking about that. Right now, we have stat maps but also ERP as .mat. Those are relatively subject-specific (a few years back, we saw this is pretty idiosyncratic, with a discriminability of about half of iris recognition).

-- we just need to see if my pipeline runs and then if it runs on scrambled data (not sure how if you switch channels around vs trials or other, this affects the pipeline as several things rely on having a structure to detect something, hopefully, it will work and just garbage out) -- depending on the scrambling, using ERPs would do the trick to see if it's anonymized IMO -- I can export those as CSV so it becomes easy to test using whatever language/package

does this makes sense?

CPernet commented 2 months ago

@robertoostenveld i can try that

marcelzwiers commented 2 months ago

-- we just need to see if my pipeline runs and then if it runs on scrambled data (not sure how if you switch channels around vs trials or other, this affects the pipeline as several things rely on having a structure to detect something, hopefully, it will work and just garbage out) -- depending on the scrambling, using ERPs would do the trick to see if it's anonymized IMO -- I can export those as CSV so it becomes easy to test using whatever language/package

does this makes sense?

I am an MRI person, so I'm relying on you to make sense of it :-). So yes, please provide outcome measures that I can use to test against the pseudo data

marcelzwiers commented 2 months ago

Based on the first subject (out of 40) the data is about 300GB. Do we really need all that data or can we make a selection (e.g. just one task and/or less subjects)?

marcelzwiers commented 2 months ago

FYI, the osf client is a very very poorly documented tool (e.g. it uses project = 9f5w7 for our project, which I only found by guessing/trial-and-error), but it seems to be an order of magnitude faster for downloading the data compared to using the browser

marcelzwiers commented 2 months ago

@CPernet I debugged your code a bit, but am now running into this EEGLAB issue, do you happen to know about this?

marcelzwiers commented 2 months ago

Ok, apparently you can't run eeglab without Fieldtrip on your path (which was very poorly documented). @CPernet I now get the following error:

[...]
Warning: Data set is short. Adjusted window size for whole data set spectrum calculation to be 1/8 of the length! 
Computing initial spectrum...
"noisefreqs" parameter was set to 'line', found line noise candidate at 59.9976 Hz!
Searching for first noise frequency between 56.9976 and 62.9976Hz...
Removing noise at 59.9976Hz... 
Using adaptive chunk length!
Matrix index is out of range for deletion.

Error in ERP_Core_WB (line 142)
            EEG(s) = [];
marcelzwiers commented 2 months ago

I think what happens is that you are deleting elements of the EEG array while looping over the EEG array. This corrupts your looping index

marcelzwiers commented 2 months ago

Ok, I applied a fix, do you agree with this @CPernet?

marcelzwiers commented 2 months ago

Ok, another error:

Warning: Data set is short. Adjusted window size for whole data set spectrum calculation to be 1/8 of the length! 
Computing initial spectrum...
"noisefreqs" parameter was set to 'line', found line noise candidate at 59.9976 Hz!
Searching for first noise frequency between 56.9976 and 62.9976Hz...
Removing noise at 59.9976Hz... 
Using adaptive chunk length!
39 dataset meet the criteria for removal and have been removed
Index exceeds the number of array elements. Index must not exceed 0.

Error in std_checkset (line 198)
    elseif ALLEEG(1).trials > 1

Error in std_rmdat (line 200)
STUDY = std_checkset(STUDY, ALLEEG);

Error in ERP_Core_WB (line 148)
        STUDY = std_rmdat(STUDY, EEG, 'datinds', find(mask));

@CPernet I stop now, could you please run the pipeline yourself?

marcelzwiers commented 2 months ago

@CPernet This fails because once error_report is initialized it always exists as a var

line 146:  if exist('error_report','var')
schoffelen commented 2 months ago

As an addition to @marcelzwiers above comment: it appears that the pipeline chokes on executing the zapline cleaning on all data arguments, with a very deep low-level SWITCH expression related error in matlab's signal toolbox code that is used to define the filter.

schoffelen commented 2 months ago

The error that the DCCN team gets on our file system is the same error as the one reported here: https://github.com/MariusKlug/zapline-plus/issues/24. in other words, there may be something wrong with zapline_plus.

Specifically, the call to matlab's bandpass function just fails with an undebuggable error (referenced in the previous comment). It seems that this error can be bypassed by providing the zapline code with a non-zero value of chunklength. This bypasses the call to bandpass.m In order to improve the probabilities that we will ever get this pipeline to run on our end, I will change this in the code, and continue debugging...

schoffelen commented 2 months ago

Currently the pipeline chokes on the execution of some limo code: https://github.com/LIMO-EEG-Toolbox/limo_tools/issues/195 Flagged it as an issue in the LIMO-EEG github, trying to see whether I can robustify the pipeline behavior

schoffelen commented 2 months ago

Currently, with the latest fix in LIMO in place, the pipeline chokes on the fact that

parfor attempts to start a parcluster anyhow (if it has not been started before), and does so at the DCCN cluster in the 'Threaded' mode. This causes an error downstream, due to the fact that it's forbidden to run mex-files in a parallel loop in threaded mode...

schoffelen commented 2 months ago

It looks as if the automatica parallel pool creation can be switched off in the preferences, but mathworks' documentation suggests that this needs to be done in the GUI...

schoffelen commented 2 months ago

OK, just to give the readership a feel of what I am working on:

previously, the ERP_Core_WB function crashed in line 276, trying to execute: limo_random_select, for now, I have managed to proceed, by explicitly switching off the automatic parpool creation in the MATLAB preferences GIO.

now, the ERP_Core_WB makes it past line 276, but crashes in line 281: limo_central_tendency_and_ci

Given that the total number of lines in this function is 613, and given that it already took some serious debugging to get this far, I am taking a 'voorschot' on the fact that it will still take some time to get this code functional in a naively started MATLAB session on the DCCN-cluster...

CPernet commented 2 months ago

fix paths, field trip dependence, added task as option for even shorter testing

@schoffelen I commented the parpool test since I updated limo master with our PR
@marcelzwiers need to pull LIMO master again just to be sure and remove FT from your path (EEGLAB should download the plugin 'ft lite' version)

for usage example, I put in the comment worked for me -- running the full thing now to generate outputs to validate against as discussed - thx again for testing and debugging

schoffelen commented 1 month ago

For the umpteenth time I tried the following, now without any task/subject contstraints:

ERP_Core_WB(fullfile(pwd, 'ERP_CORE_BIDS_Raw_Files'), fullfile(pwd, 'ERP_CORE_usecase_2.4.A'));

I get the following error in MATLAB:

bootstrapping channel 10/12 parameter 12 bootstrapping channel 11/12 parameter 12 bootstrapping channel 12/12 parameter 12 Thresholding paired_samples_ttest_parameter_1_2.mat using TFCE Applying TFCE to null data (this may take a while)... Paired t-test done

ans =

'/project/3015999.02/siesta/marzwi/ERP_CORE_usecase_2.4.A/N2pc/Ipsi-Contra/paired_samples_ttest_parameter_1_2_Cohensd.mat'

Unrecognized field name "timevect".

Error in ERP_Core_WB (line 496) subplot(1,2,1); vect = LIMO.data.timevect; hold on

So the good news is: I made it to line 496 (out of 630)

The not so good news is: I don't know what the expected behavior is, so can't fix it. Should LIMO.data have a timevect field in this case, or should it look for something else?

CPernet commented 1 month ago

on it -- I'll post an update later today -- thx

CPernet commented 1 month ago

pushed some changes to make some function more 'robust' in https://github.com/LIMO-EEG-Toolbox/limo_tools/tree/master and consequently updated the script + fixed the figure issue for NP2c ERP task.

just one more 'task' to test and it executes to the end right?

schoffelen commented 1 month ago

Thanks @CPernet , unfortunately, now a new error pops up:

Index in position 2 exceeds array bounds. Index must not exceed 150. Error in limo_match_elec (line 56) out(expected,:,:) = data(current,a_beg:a_end,:); Error in limo_ipsi_contra (line 84) data1(:,:,subject) = limo_match_elec(subj_chanlocs(subject).chanlocs,... Error in ERP_Core_WB (line 484) limo_ipsi_contra(fullfile(STUDY.filepath,'con1_files.txt'),... 56 out(expected,:,:) = data(current,a_beg:a_end,:);

As far as I can see, there's something wrong with the indexing a_beg:a_end, which in the current case is 101:250, whereas the data variable already seems to have been pruned down to 150 samples in the second dimension

schoffelen commented 1 month ago

I am a bit confused that we are now moving 'back' in time, and that I get an error at an earlier line in the code than before.

CPernet commented 1 month ago

have you pulled from limo master again? (I hade that error and fix it)

schoffelen commented 1 month ago

OK, I pulled again, started N2pc, this runs through.

Now I got an error for the N400 task:

std_limo: Invalid time upper limit, using default value insteadError: File: limo_batch.m Line: 512 Column: 1 At least one END is missing. The statement beginning here does not have a matching end.

Error in std_limo (line 570) [LIMO_files, procstatus] = limo_batch('model specification',model,[],STUDY);

Error in pop_limo (line 183) [STUDY,limofiles] = std_limo(STUDY, ALLEEG, options{:});

Error in ERP_Core_WB (line 260) [STUDY, ~, files] = pop_limo(STUDY, EEG, 'method','WLS','measure','daterp',...

I created a PR for this, please merge @CPernet https://github.com/LIMO-EEG-Toolbox/limo_tools/pull/199

robertoostenveld commented 1 month ago

OK, I pulled again, started N2pc, this runs through.

🎉🎉 Let's also celebrate the successes (and realize that we are making progress)

CPernet commented 1 month ago

yeah but it is really my fault .. poor @schoffelen (my excuse is the stupid server updated on my side did not help) looking to build the output and test of all the 2nd æevel files now (likely in python so Marcel can test anonymization before/after)

schoffelen commented 1 month ago

OK, I now made it until line 569. Before getting there, I got 2 error dialogs (which I could click away) suggesting that something not right has happened.

Error using load Unable to find file or directory 'ERPs_unrelated_single_subjects_Weighted mean.mat'.

Error in limo_plot_difference (line 119) data1 = load(varargin{1});

Error in ERP_Core_WB (line 569) Diff = limo_plot_difference('ERPs_unrelated_single_subjects_Weighted mean.mat',...

CPernet commented 1 month ago

code updated, limo_master added change which should help with your server (infamous undocumented matlab function), added output as well but we need to discuss this (opening another issue)

robertoostenveld commented 1 month ago

@CPernet given the not-so-smooth progress here, I suggest that we scale down the ambitions. Right now this use case is scheduled to have a part A (the "simple" part) and a part B (the "more challenging" part), but part A is already taking considerable amount of time. Therefore I propose that we flatten this, only finalize the part A, and call that 2.4. So part B gets scrapped for now. We can pick that up later, but for now make a clean and 100% working use case out of it. Agreed?

CPernet commented 1 month ago

right now, I beleive it is running -- of course I did not have the same server issues, but @schoffelen helped but I think now server/parallel processing settings seems to work? -- ie making sure limo_master is pulled again and all should run (it runs for me - repeatable now we need replicable)

robertoostenveld commented 1 month ago

I followed instructions to install and run it, but get

Unrecognized function or variable 'pop_importbids'.
Error in ERP_Core_WB (line 132)
        [STUDY, ALLEEG] = pop_importbids(source, 'bidsevent','on','bidschanloc','on', ... 

I can find it and will add the directory to my path

mac036-wifi> find . -name pop_importbids.m
./eeglab2024.0/plugins/bids-matlab-tools8.0/bids-matlab-tools/pop_importbids.m
robertoostenveld commented 1 month ago

... it runs further to stop at the next error:

ft_prepare_neighbours is not found in the current folder or on the MATLAB path, but exists in:
    /Users/roboos/matlab/fieldtrip

Change the MATLAB current folder or add its folder to the MATLAB path.
Error in std_prepare_neighbors (line 123)
            cfg.neighbors = ft_prepare_neighbours(tmpcfg, tmpcfg2);
Error in ERP_Core_WB (line 137)
    [~,~,AvgChanlocs] = std_prepare_neighbors(STUDY, ALLEEG, 'force', 'on'); 

also there I can find the file but setting up the complete path has failed:

mac036-wifi> find . -name ft_prepare_neighbours.m
./eeglab2024.0/plugins/Fieldtrip-lite20240111/fieldtrip-20240111/ft_prepare_neighbours.m

I will also add this to the path and try once more.

robertoostenveld commented 1 month ago

After about 30 minutes or so I get

...
resampling data 250.0000 Hz
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
resampling event latencies...
Event resorted by increasing latencies.
resampling finished
resampling data 250.0000 Hz
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
resampling event latencies...
Event resorted by increasing latencies.
resampling finished
Error using ERP_Core_WB
there has been a processing issue with all included datasets, cannot proceed 

At this moment I can only conclude that the pipeline is not sufficiently well documented for it to be reproducible. I suggest that the README specifies explicit versions (tags or commits) of the packages to use.

robertoostenveld commented 1 month ago

Can I repeat the question @CPernet ?

given the not-so-smooth progress here, I suggest that we scale down the ambitions. Right now this use case is scheduled to have a part A (the "simple" part) and a part B (the "more challenging" part), but part A is already taking considerable amount of time. Therefore I propose that we flatten this, only finalize the part A, and call that 2.4. So part B gets scrapped for now. We can pick that up later, but for now make a clean and 100% working use case out of it. Agreed?

marcelzwiers commented 1 month ago

also there I can find the file but setting up the complete path has failed:

mac036-wifi> find . -name ft_prepare_neighbours.m
./eeglab2024.0/plugins/Fieldtrip-lite20240111/fieldtrip-20240111/ft_prepare_neighbours.m

I will also add this to the path and try once more.

https://github.com/SIESTA-eu/wp15/commit/89040279df34ac43b64532f2ff03e70fb7adcfa3#commitcomment-142229564

robertoostenveld commented 1 month ago

@marcelzwiers perhaps you should leave this to Cyril, so that you can give 2.2 (mriQC on 139 anatomicals) a try.

robertoostenveld commented 1 month ago

Regarding the overall flowchart, the failure for us to run the pipeline means that non-experts will also not be able to run it and that hence step 8 in https://github.com/SIESTA-eu/wp15#data-flow cannot be completed.

CPernet commented 1 month ago

@robertoostenveld ok I will describe which toolbox to have installed -- for me on windows and on my (linux open suze) server, functions get installed automatically

Unrecognized function or variable 'pop_importbids'.
Error in ERP_Core_WB (line 132)
        [STUDY, ALLEEG] = pop_importbids(source, 'bidsevent','on','bidschanloc','on', ... 

is in theory dealt by the BIDS plug-in installed automatically if missing in the code here -- I cannot explain why it did not work

for

ft_prepare_neighbours is not found in the current folder or on the MATLAB path, but exists in:
    /Users/roboos/matlab/fieldtrip

in theory dealt by the fieldtrip plug-in installed automatically if missing in the code here -- but here conflicted somehow with your FieldTrip function, although seems also not to install the full path

I would prefer that it installs automatically rather than specifying but if I remove all my toolboxes -- for me (both windows and Linux) EEGLAB downloads and set the paths :-( .. so I will keep that part of code and add the second layer of tests if not exist issue error right away

CPernet commented 1 month ago

@robertoostenveld Can I repeat the question @CPernet ?

sure let's revise this once the other use cases work and if we have time, focusing of variety of cases and outputs

CPernet commented 1 month ago

in the readme I propose now to do in command line at install - what do you tink?

matlab -r -nodesktop -nojvm 'plugin_askinstall('bids-matlab-tools',[],1)'
matlab -r -nodesktop -nojvm 'plugin_askinstall('zapline-plus',[],1)'
matlab -r -nodesktop -nojvm 'plugin_askinstall('picard',[],1)'
matlab -r -nodesktop -nojvm 'plugin_askinstall('bids-matlab-tools',[],1)'
robertoostenveld commented 1 month ago

If that solves it, it is fine with me (although bids-matlab-tools only needs to be done once). However, it won't work since EEGLAB is not yet on the path and the plugin_askinstall function won't be found. I have made an edit to the README that seems to solve it

I think it is good to start your MATLAB session with restoredefaultpath to ensure that there are no unintentional local toolboxes on the path.

robertoostenveld commented 1 month ago

I still get the error "there has been a processing issue with all included datasets, cannot proceed". Using the MATLAB debugger it turns out that it is because MATLAB cannot find the pop_zapline_plus function. So I also added that to the path manually, which brings me to this

addpath ([pwd '/eeglab2024.0/plugins/zapline-plus1.2.1/zapline-plus-1.2.1'])
addpath ([pwd '/eeglab2024.0/plugins/bids-matlab-tools8.0/bids-matlab-tools'])
addpath ([pwd '/eeglab2024.0/plugins/Fieldtrip-lite20240111/fieldtrip-20240111'])

Then I again get the error "there has been a processing issue with all included datasets, cannot proceed". This time it is because pop_runica error: wrong value for argument 'icatype'. In the code I see it is set to picard.

@CPernet is there another undocumented dependency on https://github.com/pierreablin/picard or so? Could you run it yourself after doing restoredefaultpath so that you can see for yourself which dependencies need to be installed and on the path?

CPernet commented 1 month ago

@robertoostenveld using the updated matlab code? because it should return right away zapline and picard are not in path - I added a check for those

and it is not undocumented since it's in the install -- but yes you are right via command as indicated not good, will revise that