Closed cfhammill closed 6 years ago
Hi, this is expected. Unlike classic magetbrain, I designed this pipeline to produce the smallest possible pieces of independent work. This means independent atlas-template jobs named according to template and template-subject jobs split according to subject. Given 21 templates and 1000 subjects there's 21000 jobs at the template-subject stage.
The pipeline honours the qbatch environment variables on unknown (aka non compute Canada where I hard coded settings) clusters.
In particular, QBATCH_CHUNKSIZE will pack work into smaller numbers of longer running jobs, up to the splitting level of the stage (21 templates per subject job). You can set it higher but only that many jobs are generated into that chunk.
Thanks, that makes sense. I'm wondering now how I was able to use MAGeTBrain for such large runs. The new version is requesting 46 hours and ~27G per job, will take an eternity. Is it just the speed difference between ANTs and minctracc?
Be careful of the interaction between PPJ, CORES and CHUNKSIZE. They respectively define the number of cpus per job, the number of commands to run in parallel per job, and the number of commands to pack into a job.
Those estimates do seem a bit high, are you running the latest release (or HEAD?) there may have been some... math errors in earlier time/memory estimations...
So far I haven't touched those variables, this is just a naive run, I'm uncertain if perhaps Ben has tweaked our qbatch config, but I doubt it.
And yes, I cloned yesterday. I'm using python 3.6 with some mild hacking to prevent the system python2 from being used (I symlinked python3 to maget/bin/python)
Are there some python version issues? If so please open another issue :)
As for the job size and count bits, you can see exactly how things are estimate at https://github.com/CobraLab/antsRegistration-MAGeT/blob/master/bin/stages.sh#L48-L50
I'm not sure the python issues are general enough to warrant an issue, but I'll make one anyway, feel free to close if it's too site specific. Looking through the ABIDE files it looks like there are ~100 subjects with files 2-5x larger (in n voxels), maybe these are throwing off the estimates.
Ah, that could definitely be an issue :)
The cutneck stage is really important to limit the number of voxels to improve processing times.
Ests down to 8G 21h :ok_hand:
Great.
Will try and update README to be more explicit.
Hi Gabe, I'm trying to run antsRegistration MAGeT on the SickKids cluster, and I hit the 20,000 job hard limit on our system. Is this expected behaviour? It's ~1000 brains.