BrainModes / TVB-empirical-data-pipeline

Empirical processing pipeline for The Virtual Brain
http://dx.doi.org/10.1016/j.neuroimage.2015.03.055
22 stars 19 forks source link

added a pipesub script for qsub (torque) (WIP, needs some testing) #4

Open JensTimmerman opened 9 years ago

srothmei commented 9 years ago

Hi Jens,

good to see that people are adapting the scripts to their environments and also contributing the code back to us!

Im currently thinking about how to include this the most convenient way, since it would not make much sense to create a new repo for every different job scheduler around (like i did for SLURM and OAR).

I think ill include all the lines including calls to a job-scheduler into a new file where they can be modified easily and from which they'll be called by the main script. By then, i'll also include your changes into one of those scripts.

Thanks and all the best, Simon

JohnGriffiths commented 9 years ago

+1 to that.

I think it would also make sense to have in this new file an option to run the pipeline locally without a job-scheduler. Agree?

srothmei commented 9 years ago

Good idea! Should be easy to add this specific case once the "framework" i described above is done.

JensTimmerman commented 9 years ago

yeah, I was thingking about first creating some sort of abstraction where you would call submit_job <nodes> <procs> <name> <time> <priority/queue> <mail> <output> <dependon> scriptname

and the submit_job function would be configurable, but not every queuing systems seems to support all these features, like dependencies... I'll first test this version.

srothmei commented 9 years ago

Yes the dependencies are indeed a crucial point here: OAR for example does not support this hence i created a workaround by creating file on the local FS after job completion and checking for their existence within a background job (pretty messy...). I think dependencies would also give a hard time when it comes to run this locally i.e. without a job scheduler.

On the other hand, the whole think could then be divided into 2 forks "non-dependencies"/"dependencies". This on the other hand causes some extra work

ehiggs commented 8 years ago

Has there been any progress here? We have users who would love to use this data analysis pipeline but are waiting on upstream support.

Thanks

(nb: I am a colleague of @JensTimmerman)

srothmei commented 8 years ago

Hi,

i haven't had any time to work on this particular problem. Currently im rewriting parts of the pipeline from Octave/Matlab to Python using Nipype such that the Interfaces of the Toolboxes like MRTrix etc can be controlled directly from within Python. This will also allow to easily replace tools.

Anyway, i this is finished, i'll also present you are more easy way to exchange/port the pipeline onto different HPC/job-sheduler frameworks.

But this might take one more month i think.

JohnGriffiths commented 8 years ago

Excellent. Adapting to nipype is the way forward. That will make the code maximally portable and hackable.

Do you have a github branch for this yet?

On 9 October 2015 at 08:54, Simon Rothmeier notifications@github.com wrote:

Hi,

i haven't had any time to work on this particular problem. Currently im rewriting parts of the pipeline from Octave/Matlab to Python using Nipype such that the Interfaces of the Toolboxes like MRTrix etc can be controlled directly from within Python. This will also allow to easily replace tools.

Anyway, i this is finished, i'll also present you are more easy way to exchange/port the pipeline onto different HPC/job-sheduler frameworks.

But this might take one more month i think.

— Reply to this email directly or view it on GitHub https://github.com/BrainModes/TVB-empirical-data-pipeline/pull/4#issuecomment-146861266 .

Dr. John Griffiths

Post-Doctoral Research Fellow

Rotman Research Institute, Baycrest

Toronto, Canada

and

Honorary Associate

School of Physics

University of Sydney

srothmei commented 8 years ago

Hi John,

currently i'm translating the scripts and doing some debuging on them. I'll probably create a branch for that during this or the next week, at this moment im storing progress inside a own repo on my account.

Best, Simon

srothmei commented 8 years ago

So just to let you know, especially i got some very nice input from John at the SfN, i now finished translating the matlab-scripts into Python and will be starting to port the whole workflow into Python using Nipype-Workflows. From what i currently understand due to the docs, this will the make it far more easier to run the pipeline on different HPC structures.

JohnGriffiths commented 8 years ago

Hi Simon. Sorry for the delay in getting back to you on this.

How is this going? Have you come up with a set of workflow designs that you are happy with?

Useful reference points for nipype-based pipeline (outside of nipype itself, which is the main reference point) are the [(new) connectome mapping] toolkit((https://github.com/LTS5/cmp_nipype)), which has been re-written for nipype, and 'CPAC' for fMRI analyses.

If you look at some of the code in e.g. CPAC you will see lots of use of the nipype Function interface (e.g. here,here, etc.); I think that is one of the most useful pieces of design advice to take from this: write your analysis functions as stand-alone python functions, and wrap them in nipype using the function interface. This a lot more flexible and less labour-intensive than actually writing out interfaces (with the input spec, output spec, etc) for each of your interfaces; you just wrap the python functions directly. That also means you can test and debug the functions outside of nipype, which is useful.

Personally I prefer the CPAC architecture a lot more the cmp_nipype, which seems to overcomplicate things somewhat.

(Update: just noticed your pypeline repo. Taking a look now. Do you want to shift this discussion over to there or keep it here?)

srothmei commented 8 years ago

Hi John,

thanks for the feedback. Things are going quite slow i have to admit, the workflow stuff is also not really trivial when it comes to parallelization and also since i want to keep the flexibility as high as possible, e.g. each tracking-module should output the same streamlined dataset such that the aggregation does not have to cover all the different file formats produced by different tractography toolboxes.

Also thanks for the references. Since you discovered my repo, you'll also notice that it programmed our in-house functions as suggested by you during the SfN: As standalone functions which i will later wrap in nipype's function interface.

For further discussions which are specifically aiming for the implementation in Nypipe, lets shift them over to the new repo to keep this repo tidy.

As soon as everything is working with the nipype implementation, I will update this repository and probably make this solution the default branch in here.