apjanke / fieldtrip-parallel-support

Support files for FieldTrip parallelization work
1 stars 1 forks source link

Decide on collaboration process with VORTech for parallel testing #1

Open apjanke opened 2 years ago

apjanke commented 2 years ago

Vijay and Robert have brought in Aljen Uitbeijerse from Vortech in the Netherlands to provide a Matlab Parallel Server test bed for this project. This will be "reproducibility" scenario: instead of giving Andrew direct access to their systems, the idea is that Andrew will set up some reproducible tests in this repo and on branches in https://github.com/apjanke/fieldtrip, which the VORTech folks will be able to download and run independently.

This issue is a place for us to discuss the details of this interaction.

Questions to decide:

robertoostenveld commented 2 years ago

FieldTrip currently has Donders-specific network drives ...

Let me sketch how that looks like. We have the following main directories on our network storage when seen from linux servers and compute cluster:

/home/common/matlab/fieldtrip/data
/home/common/matlab/fieldtrip/data/ftp
/home/common/matlab/fieldtrip/data/test

which map onto the following network drives for Windows desktops/laptops

H:/common/matlab/fieldtrip/data
H:/common/matlab/fieldtrip/data/ftp
H:/common/matlab/fieldtrip/data/test

Everything under /home/common/matlab/fieldtrip/data/ftp is synchronized with our FTP server and hence public. On the ftp server it corresponds to ftp://ftp.fieldtriptoolbox.org/pub/fieldtrip. The data on the FTP server is public.

The data under test can in general not be shared; we don't have explicit permission from the researchers that shared the data with us, nor from the EEG or MEG research participants whose data it pertains.

The test functions under https://github.com/fieldtrip/fieldtrip/blob/master/test are used for nightly regression testing. The helper function https://github.com/fieldtrip/fieldtrip/blob/master/utilities/dccnpath.m is used in those functions to allow them to run on our linux servers, on our windows desktops, and on my laptop (with an external SSD disk). If you have another preferred location for the data on your own environment, feel free to add it to https://github.com/fieldtrip/fieldtrip/blob/master/utilities/dccnpath.m.

AljenU commented 2 years ago

Intro

On the "Questions to decide", some of them appear to me to be closely connected. Also, I think this effort has the most value if it is done such that resulting code changes are valuable also to other people that want to start using the Fieldtrip toolbox. Since I cannot see what has been done so far, I will first give some insight in my expectations / questions.

On parallelization

On the one hand, there is https://github.com/fieldtrip/fieldtrip/issues/1853, which states that the goals for this project are to improve performance by parallelization of fieldtrips own functions. On the other hand, there is already support for parallization of fieldtrip analyses, per https://www.fieldtriptoolbox.org/tutorial/distributedcomputing/ , and it also describes how to do that using the matlab batch function instead of qsubcellfun. If the user follows those instructions, will that lead to nested-parallelization when also lower-level parallelization is added?

Since qsubcellfun already support various cluster types, which require different settings etc., would not the most user-friendly way of adding Matlab Parallel Server support be to add it as one of the cluster types in qsubcellfun? Also, can something be said about the general workload equality of such high-level parallelization? Because if the workload is quite even between the jobs, I'd expect high level parallelization to also give one of the best performance increases, compared to lower-level parallelization.

On testing the parallelization work

Parallel computing in matlab is set up such that it should be quite easy to switch from using a local pool to a parallel server pool if the code is already set up to use a cluster definition. In that case, just switching the cluster = parcluster('local') to cluster = parcluster('profileThatUsesMPS') should make it that the execution switches from using a local cluster from the Parallel Computing Toolbox to using the Parallel Server instance as defined by the 'profileThatUsesMPS'. (This is only the happy flow of course, if the code errors there are differences.)

Thus, I expect our contribution to be, to test that this switch does indeed work correctly. Where, depending on how well it goes, some different variations of the 'profileThatUsesMPS' and where it is called from can be tested. The 'profileThatUsesMPS' will know where the cluster is located, as per https://nl.mathworks.com/help/parallel-computing/discover-clusters-and-use-cluster-profiles.html

This also means that the only things that I expect to need are:

If there is any instrumentation to watch the runtime / memory usage, I expect it to be Matlab code, inside either the testscript or the modified Fieldtrip software, that generates logs to be analyzed, and can be zipped-and-emailed (or submitted to a repo).

robertoostenveld commented 2 years ago

Hi @AljenU I think you made a good assessment.

On parallelization ...

First some background that was discussed in person (or assumed to be implicitly known) but not written down. We typically record MEG or EEG data from 20-30 research subject while they are carrying out psychological tasks. As each task and experiment is (slightly) different, we often don't have ready-made analysis pipelines. Implementing the experiment-specific analysis pipeline is a lot of manual work, subsequently running it for all participants is a lot of CPU work.

If you look at the drawing from https://www.fieldtriptoolbox.org/development/project/parallel/

pipelines

then each column corresponds to a subject. Once all subjects have been processed we continue with group analysis, that is where it converges at the bottom. Group analysis usually takes less time (since the raw data has been "condensed" considerably). When running the analysis for all subjects we can split over subjects: running things in parallel can be done either by running each completed-singlesubject-pipeline (green) in parallel, or by running each completed pipeline-step (blue) in parallel. However, this is only upon complete implementation of the pipeline (or of each pipeline step). This can indeed use qsubcellfun, or (when available, not at the Donders) the matlab batch functions.

The goal in this project is primarily to speed up the development of the pipeline, not the final execution (although that would also benefit). The manual development often takes months, the final execution (when done serial) often only a week or so, or when done parallel only a day. So we focus on making each step (small black box) in the pipeline faster, since the researcher will - during development - iterate very often over the same step in the same (pilot) subject.

Setting up qsubcellfun or learning how to use matlab batch functions is challenging for the researchers (who are usually neuroscientists, not programmers ). Hence we accept that the final execution might be inefficient (i.e. take a week, which otherwise would be spent by them learning parallel computing skills). For researchers further in their career (older PhD students and postdocs) we do expect them to pick up the parallel computing skills at a certain time. However, we always have the influx of many young and not-yet-trained researchers/students that have to learn the content of the analysis, prior to learning how to do it efficiently (which will only be done by those that continue in this line of work). As many students move on to other careers, the idiosyncratic skills of qsubcellfun or MATLAB are not so valuable, but understanding the maths/stats/analysis/ideas how to analyze brain data is. That is what they mostly learn in this first manual process of developing/writing their own analysis pipeline. Speeding up each individual FT function by parallelization-under-the-hood hopefully speeds up this process.

If we would want to optimize the scientific process, we could indeed focus on the high level parallelization that is already available and that you properly identified. But then we would also not work at a teaching-funded university and not hire PhD students all the time to do most of the work (while they learn along the way). We sometimes do have externally funded large research projects that work like that, where we prefer to work with already-trained professional (and teams with a better composition of different skills) that use more standardized pipelines. Optimizing those is also happening, but more in the direction of BIDS, AA, cloud computing, and other ongoing efforts.

On testing the parallelization work

I agree, nothing to add here.

apjanke commented 2 years ago

I agree with everything Robert says here.

Also, I think this effort has the most value if it is done such that resulting code changes are valuable also to other people that want to start using the Fieldtrip toolbox.

I agree. The intention here is to make modifications to FieldTrip to provide a low-barrier-to-entry, easy to use way for any FieldTrip users to get parallelism if they have access to the standard Matlab Parallel Computing Toolbox ("PCT"), with or without a Matalb Parallel Server ("MPS") available. And to remain compatible with, and not hurt the performance of, advanced FieldTrip users who are using a different or custom parallelization solution.

I think, and based on our prior conversations I think Robert and Vijay agree, that PCT support is a nice "easy on-ramp" to parallelism for individual researchers and smaller workgroups, especially during the more-interactive exploratory phase of work, corresponding to the "development of the pipeline" project phase that Robert mentions in his comment here.

Since I cannot see what has been done so far

My work is focused on introducing support for PCT-based parallelism in some of FieldTrip's functions by converting the "top level" loops of some of the main FieldTrip library functions from for to parfor. This will parallelize them when a PCT worker pool is available and configured, and fall back to regular for behavior in other situations (such as when a different parallelism solution is being used).

The work so far has mostly been working out the requirements & design, and analyzing the code to find candidates for changes. I'm still mostly at the "setting up testing" phase of implementation, since I'm hesitant to make substantial changes to the code base without being able to run the full regression suite. So there's not much real code to share yet.

would not the most user-friendly way of adding Matlab Parallel Server support be to add it as one of the cluster types in qsubcellfun?

I agree with Robert here: qsubcellfun is not a standard Matlab function, and it looks like it has its own learning curve. PCT-enabling some FieldTrip functions seems even user-friendlier: we may be able to get the usage instructions down to basically just "Turn on a PCT worker pool and then run your code as normal". We could also look at adding PCT/Matlab Parallel Server as another cluster type supported by qsubcellfun, but I think that should be a secondary goal.

I also think that internal PCT support is finer-grained that qsubcellfun in some cases, so it could be used to speed up smaller pieces of initial exploratory/developmental work that may not benefit so much from qsubcellfun.

If the user follows those instructions, will that lead to nested-parallelization when also lower-level parallelization is added?

I do not think so. parfor will only parallelize if your client session is connected to a Matlab PCT worker pool. If you are using a different parallelization solution, you won't have a PCT pool connected, too. And even if you did, using the alternate batch submissions should bypass the PCT and submit to the other parallel processor, because the parfor only happens at the point of the execution of that line of code, which doesn't happen on the client in this case. I think the only situation where double-parallelization would happen is if the workers in your custom parallel pool were themselves configured as Matlab PCT clients connected to a PCT/MPS worker pool. Which would be a rather unusual situation, and if that were the case, well, it seems to me like that configuration is explicitly asking for double-parallelization like this?

Though: Is there a use case where a user might be connected to both a Matlab PCT pool and another type of parallel processor in the same session, and we should worry about this some more?

On testing the parallelization work...

This agrees with my understanding. I also have nothing much to add.

If there is any instrumentation to watch the runtime / memory usage, I expect it to be Matlab code, inside either the testscript or the modified Fieldtrip software, that generates logs to be analyzed, and can be zipped-and-emailed (or submitted to a repo).

Yes. I intend the testscript to produce a zip file of the test & measurement results, and it will output the path to it at the end of execution, and you just get that file back to me through email or some other transfer mechanism we decide on. (Probably something else, because the result files may well be large.)

I am out of town this weekend, and will be back in touch early next week.

robertoostenveld commented 2 years ago

Let me get back to the nested parallelization, which I skipped over in my previous reply. In my experience (with 100s of users on our compute cluster over the years) once in a while someone will indeed figure out that this is technically possible, and then quickly learn (they are closely monitoring their batches as they are still in a mindset of trying to speed things up) that it is not optimal because of the two levels competing for resources. It is also quite easy to explain, and most researchers naturally understand that somehow it might not work as it feels a bit like a perpetuum mobile.

Of course there are cluster configurations conceivable where nested parallelization might actually have benefits. People that work with and understand those cluster configurations have IT skills at such a level that we don't have to worry about them. Instead we target the much simpler embarrassingly parallel workloads.