Closed wagmanbe closed 3 years ago
I'm wondering it might not be an e3sm-unified
problem. I just tried e3sm-diags
from e3sm-unified
through interactive jobs on haswell
. It ran well. However knl
has been problematic, it's a known issue: https://github.com/E3SM-Project/e3sm_diags/issues/314. @wagmanbe would you try it again on haswell
? If it still gives trouble, could you share your run script and I will try reproduce.
Are you not seeing these problems on knl
when you use an E3SM_Diags development environment? I have always found python packages to run slowly on knl
, so I would be surprised if this is specific to E3SM-Unified but can investigate if it appears to be. But I agree that haswell
is the recommended option for all python codes.
It's affecting both knl and haswell. Maybe it's a NERSC issue?
salloc --nodes=1 --partition=debug --time=00:20:00 -C haswell
source /global/cfs/cdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified.sh
<-- Hangs for minutes.
python
<--slow
import os
from acme_diags.parameter.core_parameter import CoreParameter
<--hangs for minutes.
NERSC was having problems yesterday afternoon/evening with very slow compute node performance that a few of us experienced, and was posted on their status page: https://www.nersc.gov/live-status/motd/ There's no notice now, so I'd recommend trying again.
Thank you, but this problem is occurring just the same today.
In this case, I suspect that the compute node problem is still there. I tried similar commands as below yesterday afternoon and got the same behavior. But tried again much later yesterday, everything looked fine...
It's affecting both knl and haswell. Maybe it's a NERSC issue?
salloc --nodes=1 --partition=debug --time=00:20:00 -C haswell
source /global/cfs/cdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified.sh
<-- Hangs for minutes.python
<--slowimport os
from acme_diags.parameter.core_parameter import CoreParameter
<--hangs for minutes.
It's at least 10x faster this afternoon.
Hi, My E3SM diagnostics jobs aren't running. Could the e3sm unified environment be bogging it down?
Interactive jobs on NERSC knl and haswell slow to a crawl after I load the e3sm unified environment, e.g
`salloc --nodes=1 --partition=debug --time=00:30:00 -C knl
source /global/cfs/cdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified.sh`
After this, everything slows down and my diagnostic script hangs on the import statements.
These problems do not occur on the login node.