PennLINC / babs

BIDS App Bootstrap (BABS)
https://pennlinc-babs.readthedocs.io
MIT License
5 stars 5 forks source link

[FIX] no 'analysis/code/babs_proj_config.yaml' file from `babs-check-setup` #136

Closed yibeichan closed 10 months ago

yibeichan commented 10 months ago

Hi @zhao-cy , when I'm running fmriprep via babs, I got the following error during babs-init

BABS project has been initialized! Path to this BABS project: '/om2/user/yibei/budapest/budapest_fmriprep'
`babs-init` was successful!
Traceback (most recent call last):
  File "/home/yibei/.conda/envs/babs/bin/babs-check-setup", line 8, in <module>
    sys.exit(babs_check_setup_main())
  File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/cli.py", line 293, in babs_check_setup_main
    babs_proj, input_ds = get_existing_babs_proj(project_root)
  File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/cli.py", line 790, in get_existing_babs_proj
    raise Exception(
Exception: `babs-init` was not successful; there is no 'analysis/code/babs_proj_config.yaml' file! Please rerun `babs-init` to finish the setup.

but analysis/code/babs_proj_config.yaml does exist in my folder /om2/user/yibei/budapest/budapest_fmriprep/analysis/code/babs_proj_config.yaml, budapest_fmriprep is the babs_project name. and the yaml file looks like the following

type_session: single-ses
type_system: slurm
input_ds:
  $INPUT_DATASET_#1:
    name: BIDS
    path_in: /om2/user/yibei/budapest/input_data
    path_data_rel: inputs/data/BIDS
    is_zipped: false
container:
  name: fmriprep-23-1-4
  path_in: /om2/user/yibei/budapest/fmriprep-container

Here is my babs-init command

babs-init \
    --where_project /om2/user/${current_user}/budapest \
    --project_name budapest_fmriprep \
    --input BIDS /om2/user/${current_user}/budapest/input_data \
    --container_ds /om2/user/${current_user}/budapest/fmriprep-container \
    --container_name fmriprep-${formatted_version} \
    --container_config_yaml_file /om2/user/${current_user}/budapest/code/fmriprep.yaml \
    --type_session single-ses \
    --type_system slurm

Does the error there is no 'analysis/code/babs_proj_config.yaml' point to somewhere else?

Thank you!

zhao-cy commented 10 months ago

Hi Yibei! This sounds weird as I saw there is message of "babs-init was successful!", but you still received some error from babs-init? Is that correct? Or babs-init was fine, but babs-check-setup threw some errors?

yibeichan commented 10 months ago

you're right, it's babs-check-setup that returns errors. I put all commands in one bash script... but now I just ran babs-check-setup and got the above error again. so babs-init should be successful

yibeichan commented 10 months ago

so in the analysis/code folder, I have the following files

.
├── babs_proj_config.yaml
├── babs_proj_config.yaml.lock
├── check_setup
│   ├── call_test_job.sh
│   ├── submit_test_job_template.yaml
│   └── test_job.py
├── fmriprep-23-1-4_zip.sh
├── participant_job.sh
├── README.md
├── sub_final_inclu.csv
└── submit_job_template.yaml
yibeichan commented 10 months ago

wait a second, i found why. i didn't run babs-check-setup in the project folder but the parent folder. now it's fixed. (but I get errors from slurm. will try to fix it

zhao-cy commented 10 months ago

Oh I was just about to reply. So yes, the solution is simple - the value for --project-root was probably wrong, and you only need to change that. If you change that, it's not necessary to change where you run babs-check-setup.

This is actually a known "bug" of BABS - see issue #97. Basically the function get_existing_babs_proj() (defined in cli.py) is currently not generalized enough, and whatever the BABS CLI calls and this function fails, the error message would always be "babs-init was not successful", which could be misleading as in this case.

If you would like to fix this issue, feel free to do so! But no pressure!

Thanks, Chenying

yibeichan commented 10 months ago

okay, I can try to fix it once I get the current fmriprep task running. I also have a question related to #25. when babs-init fails, the existing project won't clean up. if we rerun babs-init, we need to manually delete the project folder?

zhao-cy commented 10 months ago

Thanks a lot! Currently, by default babs-init will clean (i.e., remove) the generated project if it fails. Only way the project will be kept is to turn on the argument --keep-if-failed. Also if the project exists, babs-init will fail too, i.e., it does not allow you to "rerun" it. All of these are to avoid changes upon existing projects.

Therefore, if a BABS project exists, you need to create a new one (with a different project name), or manually delete the exisitng on(see step 3 of this section of doc).

yibeichan commented 10 months ago

by default babs-init will clean (i.e., remove) the generated project if it fails.

but babs-init won't remove the project directory, at least in my case. before the error reported here, i had something wrong with initializing the project and babs-init failed, but the project directory remains there. so "clean (i.e., remove) the generated project" doesn't mean "delete the project directory automatically" right?

zhao-cy commented 10 months ago

Hmmm this sounds weird. It should deletes the BABS project directory automatically, if babs-init failed. Do you still remember at which step babs-init failed? i.e., what's the last printed messages? What was your babs-init at that time + what was wrong?

Thanks, Chenying

yibeichan commented 10 months ago

i remembered that it was an error about the --container_config_yaml_file, where i put the wrong path. i don't have the error message now but it told me babs-init failed and need to re do it. the project directory remained there.

zhao-cy commented 10 months ago

I see. Even with that, BABS should clean it up. I can give it a try with wrong --container_config_yaml_file path, and see if I can replicate this issue. I may not do it now (focusing on thesis writing now) but can do it in Sept if that's fine with you.

Meanwhile, if you see similar issue again (i.e., babs-init failed but it did not remove the created BABS project, even though you did not use argument --keep-if-failed), it will be wonderful to let me know your babs-init command and the printed messages.

Thanks, Chenying

yibeichan commented 10 months ago

i can replicate the error and give you the command and error message. i'll get back to you within 1-2 weeks. Please, and definitely focus on writing for now :)

yibeichan commented 10 months ago

a quick update. babs-init did clean up/delete the folder this time. I used a wrong path of yaml file to test and here is the message output. I can't remember why last time with the same error happened I had to manually delete the folder but this time everything is good!


`babs-init` failed! Below is the error message:
Traceback (most recent call last):
  File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/cli.py", line 234, in babs_init_main
    babs_proj.babs_bootstrap(input_ds,
  File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/babs.py", line 436, in babs_bootstrap
    container = Container(container_ds, container_name, container_config_yaml_file)
  File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/babs.py", line 2198, in __init__
    raise Exception("The yaml file of the container's configurations '"
Exception: The yaml file of the container's configurations '/om2/user/yibei/budapest/code/fmriprep.yaml' does not exist!

Cleaning up created BABS project...
Removing input dataset(s) if cloned...
uninstall(ok): inputs/data/BIDS (dataset)
remove(ok): inputs/data/BIDS (dataset)                                                                         
add(ok): .gitmodules (file)                                                                                    
save(ok): . (dataset)                                                                                          
action summary:                                                                                                
  add (ok: 1)
  remove (ok: 1)
  save (ok: 1)
  uninstall (ok: 1)

Running `git annex dead here`...

Updating input and output RIA if created...
publish(ok): . (dataset) [refs/heads/master->input:refs/heads/master [new branch]]                             
publish(ok): . (dataset) [refs/heads/git-annex->input:refs/heads/git-annex [new branch]]                       
action summary:                                                                                                
  publish (ok: 2)
publish(ok): . (dataset) [refs/heads/master->output:refs/heads/master [new branch]]                            
publish(ok): . (dataset) [refs/heads/git-annex->output:refs/heads/git-annex [new branch]]                      
                                                                                                              action summary:                                                                                                 
  publish (ok: 2)

Deleting created BABS project folder...

Created BABS project has been cleaned up.
Please check the error messages above! Then fix the problem, and rerun `babs-init`.
babs.sh: line 59: cd: budapest_fmriprep: No such file or directory
yibeichan commented 10 months ago

okay, I keep getting slurm configuration errors during check-setup

Submitting a test job, will take a while to finish...
Although the script will be submitted to a compute node, this test job will not run the BIDS App; instead, this test job will gather setup information in the designated environment and make sure future BABS jobs with current setups will be able to finish successfully.
sbatch: error: Temporary disk specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
Traceback (most recent call last):
  File "/home/yibei/.conda/envs/babs/bin/babs-check-setup", line 8, in <module>    sys.exit(babs_check_setup_main())
  File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/cli.py", line 296, in babs_check_setup_main
    babs_proj.babs_check_setup(input_ds, args.job_test)
  File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/babs.py", line 896, in babs_check_setup
    _, job_id_str, log_filename = submit_one_test_job(self.analysis_path, self.type_system)
  File "/home/yibei/.conda/envs/babs/lib/python3.9/site-packages/babs/utils.py", line 1677, in submit_one_test_job
    proc_cmd.check_returncode()
  File "/home/yibei/.conda/envs/babs/lib/python3.9/subprocess.py", line 460, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['sbatch', '--export=DSLOCKFILE=/om2/scratch/Sun/yibei/budapest/budapest_fmriprep/analysis/.SLURM_datalad_lock', '--job-name', 'fmr_test_job', '-e', '/om2/scratch/Sun/yibei/budapest/budapest_fmriprep/analysis/logs/fmr_test_job.e%A', '-o', '/om2/scratch/Sun/yibei/budapest/budapest_fmriprep/analysis/logs/fmr_test_job.o%A', '/om2/scratch/Sun/yibei/budapest/budapest_fmriprep/analysis/code/check_setup/call_test_job.sh']' returned non-zero exit status 1.

the errors are

sbatch: error: Temporary disk specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

but my slurm setup is like

#!/bin/bash
#SBATCH --mem=20G
#SBATCH --tmp=50G
#SBATCH --time=12:00:00
#SBATCH --cpus-per-task=8

which should be reasonable parameters.

zhao-cy commented 10 months ago

Hi Yibei! It's great to hear that babs-init cleaned it up when there was an error. If you again noticed that it did not behave as expected, please create a new issue.

For error when babs-check-setup: Are you using the MIT OpenMind cluster? @djarecka previously told me that --tmp (i.e., temporary_disk_space defined in BABS) does not work on this cluster - somehow it does not recognize this Slurm directive. Please try removinng this and see. In the future, you may need to avoid using that on MIT cluster.

I still keep this keyword because it works on other clusters (e.g., UMN MSI Slurm cluster, and our PennMed CUBIC SGE cluster), and it is important to include especially for real BIDS Apps like fMRIPrep.

yibeichan commented 10 months ago

THANK YOU!! yes, i'm using MIT open mind, and removing this temporary_disk_space just worked! all jobs submitted! (closing this issue now

zhao-cy commented 10 months ago

No problem! If you make changes in an existing BABS project, please make sure you use datalad to save the changes and update RIA stores before you move on:

cd <project_root>/analysis
datalad status    # check which file you changed; optional if you are sure
datalad save -m "the message xxx" code/<which_script_you_changed.sh>
datalad push --to input
datalad push --to output   # if there is no successful jobs saved in output RIA

Otherwise, please create a new BABS project with updated YAML file.

yibeichan commented 10 months ago

yes, i deleted everything and created a new project.