Closed Kirito-Ma closed 2 years ago
@Kirito-Ma, in makeflow spaces and new lines are important characters, I think you want:
CATEGORY=test_slurm
MEMORY=10024
CORES=4
WALL_TIME=1500
/stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Secretory_ppcseq.rds:
Rscript /stornext/HPCScratch/home/ma.m/mengyao_data_scripts/COVID_19/run_ppcseq/run_ppcseq.R /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_de_data/Secretory_DE.rds /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Secretory_ppcseq.rds
I think you were missing the tab before Rscript and the new line at the end of file.
HI, thanks a lot. May I ask another problem? It shows similar situation. My Rscript runs well in the terminal but failed in the slurm. The error shows like this:
Rscript /stornext/HPCScratch/home/ma.m/mengyao_data_scripts/COVID_19/run_ppcseq/run_ppcseq.R /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_de_data/Basal_DE.rds /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Basal_ppcseq.rds failed with exit code 137 deleted makeflow.failed.1 rule 1 failed, moving any outputs to makeflow.failed.1 deleted /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Basal_ppcseq.rds
Does it fail right away?
When running on a terminal, could you do a:
resource_monitor -Omon -- Rscript /stornext/HPCScratch/home/ma.m/mengyao_data_scripts/COVID_19/run_ppcseq/run_ppcseq.R /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_de_data/Basal_DE.rds /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Basal_ppcseq.rds
and post the contents of the file mon.summary
generated?
There is the error from the R terminal.
bash: resource_monitor: command not found
It should be in the same place as the makeflow
executable. From where are you executing makeflow? The following should work:
# needed only once:
curl -O http://ccl.cse.nd.edu/software/files/cctools-7.4.3-x86_64-centos7.tar.gz
tar xf cctools-7.4.3-x86_64-centos7.tar.gz
# every time:
export PATH=$(pwd)/cctools-7.4.3-x86_64-centos7.tar.gz-dir/bin:$PATH
resource_monitor -Omon -- Rscript /stornext/HPCScratch/home/ma.m/mengyao_data_scripts/COVID_19/run_ppcseq/run_ppcseq.R /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_de_data/Basal_DE.rds /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Basal_ppcseq.rds
Could you open the file mon.summary, select everything, and copy the contents here? (The file should have been generated by the above command.)
Hi, thanks.
{ "executable_type":"dynamic", "monitor_version":"7.4.3.", "host":"milton-login02.hpc.wehi.edu.au", "command":"Rscript /stornext/HPCScratch/home/ma.m/mengyao_data_scripts/COVID_19/run_ppcseq/run_ppcseq.R /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_de_data/Basal_DE.rds /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Basal_ppcseq.rds", "exit_status":0, "exit_type":"normal", "start": [ 1644415133.555023, "s" ], "end": [ 1644415588.830632, "s" ], "wall_time": [ 455.275609, "s" ], "cpu_time": [ 458.49, "s" ], "memory": [ 17246, "MB" ], "virtual_memory": [ 19713, "MB" ], "swap_memory": [ 0, "MB" ], "disk": [ 2627, "MB" ], "bytes_read": [ 80, "MB" ], "bytes_written": [ 73, "MB" ], "bytes_received": [ 0, "MB" ], "bytes_sent": [ 0, "MB" ], "bandwidth": [ 0, "Mbps" ], "gpus": [ 0, "gpus" ], "cores": [ 1.195, "cores" ], "cores_avg": [ 1.007, "cores" ], "machine_cpus": [ 32, "cores" ], "machine_load": [ 1, "procs" ], "context_switches": [ 12984, "switches" ], "max_concurrent_processes": [ 4, "procs" ], "total_processes": [ 9, "procs" ], "total_files": [ 40676, "files" ], "fs_nodes": [ 0, "nodes" ], "workers": [ 0, "workers" ], "peak_times": { "total_files": [ 12.001, "s" ], "max_concurrent_processes": [ 20.65, "s" ], "context_switches": [ 455.276, "s" ], "machine_load": [ 123.017, "s" ], "machine_cpus": [ 4.447, "s" ], "cores_avg": [ 424.745, "s" ], "cores": [ 432.562, "s" ], "bytes_sent": [ 36.596, "s" ], "bytes_received": [ 36.596, "s" ], "bytes_written": [ 385.025, "s" ], "bytes_read": [ 385.025, "s" ], "disk": [ 12.001, "s" ], "virtual_memory": [ 385.025, "s" ], "memory": [ 385.025, "s" ], "cpu_time": [ 450.317, "s" ], "wall_time": [ 455.276, "s" ], "end": [ 455.276, "s" ], "start": [ 0, "s" ] } }
Great! I think we are getting somewhere. It seems that your program uses more memory than the one you specified, and therefore is killed eventually by slurm. If you look above, you'll see:
"memory":
[
17246,
"MB"
],
I would try by changing your makeflow memory line to: MEMORY=20000 and see if that works.
Hi btovar,
Thanks a lot. It works this time!
Hi btovar,
I have one more question. In my Rscript, I got a dataframe. I would like it to perform nothing and not save it to rds. I create an empty tibble but not save to rds. However, it always cause slurm to fail. May I ask why this is the case and how to fix it?
Nice, good news!
For your error, we can get the error output as follows: modify your makeflow rule to be something like:
CATEGORY=test_slurm
MEMORY=20000
CORES=4
WALL_TIME=1500
/stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Secretory_ppcseq.rds ERROR-1.output:
Rscript /stornext/HPCScratch/home/ma.m/mengyao_data_scripts/COVID_19/run_ppcseq/run_ppcseq.R /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_de_data/Secretory_DE.rds /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Secretory_ppcseq.rds > ERROR-1.output 2>&1
That is, we add the ERROR-1.output
file as an output, and append > ERROR-1.output 2>&1
at the end of the command line. Once the workflow fails, you can open ERROR-1.output
and see the exact error you are getting from R.
Hi, thanks! I solved the problem by following your instructions.
Thanks for letting us know!
Hi, I met a problem that my makeflow did not work. I used R terminal to test it which works well. However, when I tried to use slurm makeflow to run it, it showed me the error.
parsing /stornext/HPCScratch/home/ma.m/test.makeflow... 2022/02/09 22:12:38.02 makeflow[11201] fatal: Found end of file while completing command. line: 6 column: 293 Terminated
Here is the code of my makeflow: CATEGORY=test_slurm MEMORY=10024 CORES=4 WALL_TIME=1500 /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Secretory_ppcseq.rds: Rscript /stornext/HPCScratch/home/ma.m/mengyao_data_scripts/COVID_19/run_ppcseq/run_ppcseq.R /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_de_data/Secretory_DE.rds /stornext/HPCScratch/home/ma.m/single_cell_database/COVID_19/data/all_ppcseq_data/Secretory_ppcseq.rds
I thought the makeflow did not run at all. May I know how to fix it?