Closed brennanbrunsvik closed 8 months ago
I played with the different shells, and somehow got the code to work again. For some reason, if executing from zsh, the subprocesses did not have access to the Anaconda bin of my environment. I switched to tcsh on a whim, and things are working now. I updated Conda along the way, and maybe this was the magic trick. It seems like everything is running from zsh again as well (for now). If somebody else runs into this problem, I recommend updating Conda and making a simple test script to check the environment that Python opens in when using subprocess.run.
Hey @brennanbrunsvik, thanks for bringing this up, glad an update was able to solve that! It's always tricky trying to finagle a Python environment to work on suprocesess/compute nodes!
I am running Seisflows from an Anaconda environment. When using the system option "Cluster", I get the error "ModuleNotFoundError: No module named 'seisflows'". I see that cluster.py runs subprocess.run to run several instances of Specfem. As I understand it, each subprocess will not run in the same Anaconda environment as the parent process, but the default environment instead, thus not having access to Seisflows. It looks like the workaround is to explicitly use sys.executable within the subprocess call: https://stackoverflow.com/questions/51819719/using-subprocess-in-anaconda-environment.
However, I am confused as to why cluster's submit operation was working for anybody, if this is the case. I assume that Seisflows users are using a virtual environment, likely Conda. Importantly, I was running cluster just fine up until a few weeks ago.
I included some files that can be ran to show the problem. test_subprocess_failure_cluster.zip