Open gdevenyi opened 2 years ago
I am running some tests now so I'm not exactly sure what part of the code is responsible yet, but note that MAGeT scales (for total operations -- it's not as bad if you only consider registrations) at least like number of atlases number of templates number of subjects, so probably reducing the number of templates is the easiest way to bring down the overall cost. My guess is that the overall issue is some combination of redundant file accesses via pyminc
or creation of the output directories, but there are some CPU-limited parts as well which could also be optimized.
Indeed, the majority of time appears to be spent in the output_directories
and create_directories
utility functions.
At some point I added --defer-directory-creation
which should help with the create_directories
contribution but not output_directories
-- the latter is maybe a case of os.path
functions doing I/O ...
--defer-directory-creation
was able to get past and get to job submission
Which then failed with:
7723 [2022-01-31 11:02:47.794,pydpiper.execution.pipeline,ERROR] Failed launching executors from the server.
7724 Traceback (most recent call last):
7725 File "/project/m/mchakrav/quarantine/2019b/pydpiper/2.0.13/install/lib/python3.6/site-packages/pydpiper-2.0.13-py3.6.egg/pydpiper/execution/pipeline.py", line 825, in launchExecutorsFromServer
7726 mem_needed=memNeeded, uri_file=self.exec_options.urifile)
7727 File "/project/m/mchakrav/quarantine/2019b/pydpiper/2.0.13/install/lib/python3.6/site-packages/pydpiper-2.0.13-py3.6.egg/pydpiper/execution/pipeline.py", line 969, in launchPipelineExecutors
7728 pipelineExecutor.submitToQueue(number=number)
7729 File "/project/m/mchakrav/quarantine/2019b/pydpiper/2.0.13/install/lib/python3.6/site-packages/pydpiper-2.0.13-py3.6.egg/pydpiper/execution/pipeline_executor.py", line 440, in submitToQueue
7730 raise SubmitError({ 'return' : p.returncode, 'failed_command' : submit_cmd })
7731 pydpiper.execution.pipeline_executor.SubmitError: {'return': 1, 'failed_command': ['qbatch', '--chunksize=1', '--cores=1', '--jobname=ASYN-long-20220121-executor-2022-01-31-at-11-02-47', '-b', 'slurm', 7731 '--walltime=23:59:59', '-']}
Terminal said (should've been captured I think for the log?
.SBATCH error: Batch job submission failed: Pathname of a file, directory or other parameter too long
We're retrying with --csv-file
We're currenly trying to submit a MAGeT.py pipeline to Niagara for processing.
MAGeT.py ends up spinning for ~2h on CPU time doing something, before being killed by Niagara for being bad on a login node. No jobs ever get to submission, and no other work is done.
Run command
Config
The pipeline stages are generated:
However the log never goes beyond
Before being killed.