Closed swelborn closed 1 year ago
It's possible this is related to https://github.com/NERSC/podman-hpc/issues/54 and may be addressed by https://github.com/NERSC/podman-hpc/pull/62
Should hopefully be fixed via https://github.com/NERSC/podman-hpc/pull/62
I wrote a thread about this in #podman slack channel, copying the discussion here:
Before I debugged it...
I have a bug happening in an interactive job. I will file an issue if you all think it is related to podman. Here is the rundown:
srun -N 2 -n 2 podman-hpc run --rm -v $HOME:/mnt/ --network=host -it samwelborn:stempy-streaming /mnt/utility/run_node.sh
Maybe it is just a non-clean exit from the
node
binary? Thing is,ps -u
gives me this after I shutdown the previous process:It works just fine if I exit the job and start another one up.
Debugging
@tylern4 pointed me to
#!/bin/bash -x
, and the program was indeed hanging onAfter adding
--log-level=debug
to my run command, I come up with the following:Fix (?)
I was also coming up with this:
So I changed my run script to this:
and it seems to stop the previous containers. My program then starts normally: