Closed yibeichan closed 1 year ago
Hi Yibei! This sounds weird as I saw there is message of "babs-init
was successful!", but you still received some error from babs-init
? Is that correct? Or babs-init
was fine, but babs-check-setup
threw some errors?
you're right, it's babs-check-setup
that returns errors. I put all commands in one bash script... but now I just ran babs-check-setup
and got the above error again. so babs-init
should be successful
so in the analysis/code
folder, I have the following files
.
├── babs_proj_config.yaml
├── babs_proj_config.yaml.lock
├── check_setup
│ ├── call_test_job.sh
│ ├── submit_test_job_template.yaml
│ └── test_job.py
├── fmriprep-23-1-4_zip.sh
├── participant_job.sh
├── README.md
├── sub_final_inclu.csv
└── submit_job_template.yaml
wait a second, i found why. i didn't run babs-check-setup in the project folder but the parent folder. now it's fixed. (but I get errors from slurm. will try to fix it
Oh I was just about to reply. So yes, the solution is simple - the value for --project-root
was probably wrong, and you only need to change that. If you change that, it's not necessary to change where you run babs-check-setup
.
This is actually a known "bug" of BABS - see issue #97. Basically the function get_existing_babs_proj()
(defined in cli.py
) is currently not generalized enough, and whatever the BABS CLI calls and this function fails, the error message would always be "babs-init
was not successful", which could be misleading as in this case.
If you would like to fix this issue, feel free to do so! But no pressure!
Thanks, Chenying
okay, I can try to fix it once I get the current fmriprep task running. I also have a question related to #25. when babs-init
fails, the existing project won't clean up. if we rerun babs-init
, we need to manually delete the project folder?
Thanks a lot! Currently, by default babs-init
will clean (i.e., remove) the generated project if it fails. Only way the project will be kept is to turn on the argument --keep-if-failed
. Also if the project exists, babs-init
will fail too, i.e., it does not allow you to "rerun" it. All of these are to avoid changes upon existing projects.
Therefore, if a BABS project exists, you need to create a new one (with a different project name), or manually delete the exisitng on(see step 3 of this section of doc).
by default
babs-init
will clean (i.e., remove) the generated project if it fails.
but babs-init
won't remove the project directory, at least in my case. before the error reported here, i had something wrong with initializing the project and babs-init
failed, but the project directory remains there. so "clean (i.e., remove) the generated project" doesn't mean "delete the project directory automatically" right?
Hmmm this sounds weird. It should deletes the BABS project directory automatically, if babs-init
failed. Do you still remember at which step babs-init
failed? i.e., what's the last printed messages? What was your babs-init
at that time + what was wrong?
Thanks, Chenying
i remembered that it was an error about the --container_config_yaml_file
, where i put the wrong path. i don't have the error message now but it told me babs-init
failed and need to re do it. the project directory remained there.
I see. Even with that, BABS should clean it up. I can give it a try with wrong --container_config_yaml_file
path, and see if I can replicate this issue. I may not do it now (focusing on thesis writing now) but can do it in Sept if that's fine with you.
Meanwhile, if you see similar issue again (i.e., babs-init
failed but it did not remove the created BABS project, even though you did not use argument --keep-if-failed
), it will be wonderful to let me know your babs-init
command and the printed messages.
Thanks, Chenying
i can replicate the error and give you the command and error message. i'll get back to you within 1-2 weeks. Please, and definitely focus on writing for now :)
a quick update. babs-init
did clean up/delete the folder this time. I used a wrong path of yaml file to test and here is the message output.
I can't remember why last time with the same error happened I had to manually delete the folder but this time everything is good!
`babs-init` failed! Below is the error message:
Traceback (most recent call last):
File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/cli.py", line 234, in babs_init_main
babs_proj.babs_bootstrap(input_ds,
File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/babs.py", line 436, in babs_bootstrap
container = Container(container_ds, container_name, container_config_yaml_file)
File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/babs.py", line 2198, in __init__
raise Exception("The yaml file of the container's configurations '"
Exception: The yaml file of the container's configurations '/om2/user/yibei/budapest/code/fmriprep.yaml' does not exist!
Cleaning up created BABS project...
Removing input dataset(s) if cloned...
uninstall(ok): inputs/data/BIDS (dataset)
remove(ok): inputs/data/BIDS (dataset)
add(ok): .gitmodules (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
remove (ok: 1)
save (ok: 1)
uninstall (ok: 1)
Running `git annex dead here`...
Updating input and output RIA if created...
publish(ok): . (dataset) [refs/heads/master->input:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->input:refs/heads/git-annex [new branch]]
action summary:
publish (ok: 2)
publish(ok): . (dataset) [refs/heads/master->output:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->output:refs/heads/git-annex [new branch]]
action summary:
publish (ok: 2)
Deleting created BABS project folder...
Created BABS project has been cleaned up.
Please check the error messages above! Then fix the problem, and rerun `babs-init`.
babs.sh: line 59: cd: budapest_fmriprep: No such file or directory
okay, I keep getting slurm configuration errors during check-setup
Submitting a test job, will take a while to finish...
Although the script will be submitted to a compute node, this test job will not run the BIDS App; instead, this test job will gather setup information in the designated environment and make sure future BABS jobs with current setups will be able to finish successfully.
sbatch: error: Temporary disk specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
Traceback (most recent call last):
File "/home/yibei/.conda/envs/babs/bin/babs-check-setup", line 8, in <module> sys.exit(babs_check_setup_main())
File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/cli.py", line 296, in babs_check_setup_main
babs_proj.babs_check_setup(input_ds, args.job_test)
File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/babs.py", line 896, in babs_check_setup
_, job_id_str, log_filename = submit_one_test_job(self.analysis_path, self.type_system)
File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/utils.py", line 1677, in submit_one_test_job
proc_cmd.check_returncode()
File "/home/yibei/.conda/envs/babs/lib/python3.9/subprocess.py", line 460, in check_returncode
raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['sbatch', '--export=DSLOCKFILE=/om2/scratch/Sun/yibei/budapest/budapest_fmriprep/analysis/.SLURM_datalad_lock', '--job-name', 'fmr_test_job', '-e', '/om2/scratch/Sun/yibei/budapest/budapest_fmriprep/analysis/logs/fmr_test_job.e%A', '-o', '/om2/scratch/Sun/yibei/budapest/budapest_fmriprep/analysis/logs/fmr_test_job.o%A', '/om2/scratch/Sun/yibei/budapest/budapest_fmriprep/analysis/code/check_setup/call_test_job.sh']' returned non-zero exit status 1.
the errors are
sbatch: error: Temporary disk specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
but my slurm setup is like
#!/bin/bash
#SBATCH --mem=20G
#SBATCH --tmp=50G
#SBATCH --time=12:00:00
#SBATCH --cpus-per-task=8
which should be reasonable parameters.
Hi Yibei! It's great to hear that babs-init
cleaned it up when there was an error. If you again noticed that it did not behave as expected, please create a new issue.
For error when babs-check-setup
: Are you using the MIT OpenMind cluster? @djarecka previously told me that --tmp
(i.e., temporary_disk_space
defined in BABS) does not work on this cluster - somehow it does not recognize this Slurm directive. Please try removinng this and see. In the future, you may need to avoid using that on MIT cluster.
I still keep this keyword because it works on other clusters (e.g., UMN MSI Slurm cluster, and our PennMed CUBIC SGE cluster), and it is important to include especially for real BIDS Apps like fMRIPrep.
THANK YOU!! yes, i'm using MIT open mind, and removing this temporary_disk_space
just worked! all jobs submitted! (closing this issue now
No problem! If you make changes in an existing BABS project, please make sure you use datalad
to save the changes and update RIA stores before you move on:
cd <project_root>/analysis
datalad status # check which file you changed; optional if you are sure
datalad save -m "the message xxx" code/<which_script_you_changed.sh>
datalad push --to input
datalad push --to output # if there is no successful jobs saved in output RIA
Otherwise, please create a new BABS project with updated YAML file.
yes, i deleted everything and created a new project.
Hi @zhao-cy , when I'm running fmriprep via babs, I got the following error during
babs-init
but
analysis/code/babs_proj_config.yaml
does exist in my folder/om2/user/yibei/budapest/budapest_fmriprep/analysis/code/babs_proj_config.yaml
,budapest_fmriprep
is the babs_project name. and the yaml file looks like the followingHere is my babs-init command
Does the error
there is no 'analysis/code/babs_proj_config.yaml'
point to somewhere else?Thank you!