Open yw-fang opened 1 week ago
Hi @yw-fang,
unfortunately I don't have much experience running with containers on HPC systems. I see that you already tried to set an alias for the jf
command, but it seems it did not work.
In general, I think that if this is a use case it would be fine to add an option to the jobflow-remote configuration to customize the jf -fe execution run
command. In that way you could replace it directly with singularity exec ~/jobflowenv.sif jf -fe execution run
.
Before proceeding, can you maybe check if this would actually solve your problem? In principle you could manually edit the submission script and replace the last line with one that calls jf
through the container. Unless stop the runner immediately after the job has been SUBMITTED
, jobflow-remote will still see the job as failed, but at least you can check if the job is executed correctly. (based on the submission script I suppose this could be some quantum espresso simulation, so you should still be able to see if that was properly completed).
Hi, @gpetretto thank you very much for your response! It could work if I replaced "jf" with "singularity exec ~/jobflowenv.sif jf" manually and the subsequent espresso calculation would be done. How to rewrite the jobflow-remote configuration so that 'jf' is automatically replaced by "singularity exec ~/jobflowenv.sif jf"? This is I was looking for but didn't find how to make it.
Good to know that it works in that case. Unfortunately the option is not there at the moment, but I can probably implement it in the next days. In the meanwhile, if you wish you can manually edit this line in the source code: https://github.com/Matgenix/jobflow-remote/blob/c9d5d87fdfe4dbd5605021e2258d4b341c7a2201/src/jobflow_remote/jobs/runner.py#L624 so that it adds the call to the container in your submission script.
@gpetretto It works. Many thanks. If you don't mind, I'd like to keep this issue open until you have other implement to avoid users to change the source code directly.
Hi, all
Due to the requirement of the HPC I am using, I had to install it in a container. I found that in the interactive shell mode, I could run jf command in this way:
singularity exec ~/jobflowenv.sif jf
and it worked as expected.However, it didn't work if I tried execute it in the slurm job script. Here's my job script submit.sh
By examining queue.out, I found the lines beginning with 'singularity' in the submit.sh worked and printed the information about python version and help information of jobflow-remote. However, the last line
jf -fe execution run /scratch/qejobflow/87/50/55/875055c1-7fa4-4a40-bee5-6ddb63be599a_1
didn't work and it raised an error "/var/spool/slurmd/job8184623/slurm_script: line 24: jf: command not found " in the queue.err file.I am wondering if anyone has similar experience. I'll greatly appreciate if anyone could make some comments!