chengzhuzhang commented 4 years ago

MPI_init_thread erorr when running e3sm_diags within e3sm_unified 1.3.1.1 on cori knl and haswell. Error message as below:

(e3sm_unified_1.3.1.1) chengzhu@nid00941:~/acme_diags_parameters/acme_diags> python tutorial_2020_climo_sets.py 
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[nid00941:57912] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

It was okay running on cori login. Also running standalone e3sm_diags env is also fine.

xylar commented 4 years ago

@chengzhuzhang, could you see if you add mpi4py to your standalone environment if you see the same error? I think we saw MPI problems in the past in environments that had mpi4py from conda-forge. If it's not that, it must be the mpi version of some other package that's causing trouble.

xylar commented 4 years ago

$ source load_latest_e3sm_unified.sh
$ conda list | grep mpi
compass                   0.1.8           nompi_py_h6eb0c47_100    e3sm
e3sm-unified              1.3.1.1         nompi_py37h6eb0c47_0    e3sm
esmf                      8.0.1           nompi_hbeb3ca6_0    conda-forge
esmpy                     8.0.1           nompi_py37h777d1d2_0    conda-forge
hdf5                      1.10.6          nompi_h3c11f04_100    conda-forge
libnetcdf                 4.7.4           nompi_h84807e1_105    conda-forge
mpi                       1.0                     openmpi    conda-forge
mpi4py                    3.0.3            py37hbfacf26_1    conda-forge
netcdf-fortran            4.5.3           nompi_hfef6a68_100    conda-forge
netcdf4                   1.5.3           nompi_py37hdc49583_105    conda-forge
openmpi                   4.0.4                hdf1f1ad_0    conda-forge

Another possibility that occurs to me is that the nompi version of esmf might not work for you if you're trying to run it with MPI.

I have to say, I've had no luck in general running MPI versions of conda packages on Cori nodes. If you're able to figure out what package is causing the problem I'm happy to help debug.

One more thing to try might be to see if load_latest_e3sm_unified_mpich.sh works any better. I think CDAT didn't like mpich so maybe that's a bad option.

Finally, I don't install it but I build an OpenMPI version of E3SM-Unified. You could try installing that version yourself and see if it makes a difference.

My guess is that the MPI variants of E3SM-Unified probably won't help. But it doesn't hurt to try.

chengzhuzhang commented 4 years ago

Thank you, Xylar!

Hi @forsyth2, I encountered this issue while making e3sm_diags tutorial, and will spend more time getting the tutorial done. Would you please invest this issue following Xylar's instructions? Thank you.

forsyth2 commented 4 years ago

@chengzhuzhang I am able to reproduce the error with:

salloc --nodes=1 --partition=debug --time=00:30:00 -C haswell
source /global/cfs/cdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified.sh
cd tests/system
python all_sets.py -d all_sets.cfg

I tried conda install mpi4py but I get:

EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
  environment location: /global/cfs/cdirs/e3sm/software/anaconda_envs/base/envs/e3sm_unified_1.3.1.1

I tried:

source /global/cfs/cdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified_mpich.sh
cd tests/system
python all_sets.py -d all_sets.cfg

This appears to be successful.

chengzhuzhang commented 4 years ago

Thank you, @forsyth2. Good news with the successful run with mpich version. I don't think we can install packages in the unified env. I think what Xylar suggested is to install mpi4py in to standalone e3sm_diags env to see if that could cause trouble.

xylar commented 4 years ago

Yep, @forsyth2, I protect the e3sm-unified environments because they belong to everyone and that means it causes trouble if anyone but me installs packages.

Dev environment

I'm creating a "dev" environment for e3sm_diags on my laptop with the following:

conda create -y -n e3sm_diags_env_dev -c cdat/label/v82 -c conda-forge -c defaults python=3.7 \
  "cdp>=1.6.0" "vcs>=8.2"   "vtk-cdat=8.2.0.8.2" "vcsaddons>=8.2" "dv3d>=8.2" "cdms2>=3.1.4" \
  "cdutil>=8.2" "genutil>=8.2" "cdtime>=3.1.2" numpy matplotlib "cartopy>=0.18.0" beautifulsoup4 lxml

Here's what I'm seeing:

$ conda list -n e3sm_diags_env_dev | grep mpi
esmf                      8.0.1           nompi_hbeb3ca6_0    conda-forge
esmpy                     8.0.1           nompi_py37h777d1d2_0    conda-forge
hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
libnetcdf                 4.7.4           nompi_h84807e1_105    conda-forge
netcdf-fortran            4.5.3           nompi_hfef6a68_100    conda-forge

Dev with mpi4py (and openmpi)

If I add mpi4py, like e3sm-unified has:

conda create -y -n e3sm_diags_env_dev -c cdat/label/v82 -c conda-forge -c defaults python=3.7 \
  "cdp>=1.6.0" "vcs>=8.2"   "vtk-cdat=8.2.0.8.2" "vcsaddons>=8.2" "dv3d>=8.2" "cdms2>=3.1.4" \
  "cdutil>=8.2" "genutil>=8.2" "cdtime>=3.1.2" numpy matplotlib "cartopy>=0.18.0" beautifulsoup4 \
   lxml mpi4py

I see:

$ conda list -n e3sm_diags_env_dev | grep mpi
esmf                      8.0.1           nompi_hbeb3ca6_0    conda-forge
esmpy                     8.0.1           nompi_py37h777d1d2_0    conda-forge
hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
libnetcdf                 4.7.4           nompi_h84807e1_105    conda-forge
mpi                       1.0                     openmpi    conda-forge
mpi4py                    3.0.3            py37hbfacf26_1    conda-forge
netcdf-fortran            4.5.3           nompi_hfef6a68_100    conda-forge
openmpi                   4.0.4                hdf1f1ad_0    conda-forge

Dev environment with all mpich libraries

If I instead force the mpich versions of various libraries:

conda create -y -n e3sm_diags_env_dev -c cdat/label/v82 -c conda-forge -c defaults python=3.7 \
  "cdp>=1.6.0" "vcs>=8.2"   "vtk-cdat=8.2.0.8.2" "vcsaddons>=8.2" "dv3d>=8.2" "cdms2>=3.1.4" \
  "cdutil>=8.2" "genutil>=8.2" "cdtime>=3.1.2" numpy matplotlib "cartopy>=0.18.0" beautifulsoup4 \
   lxml mpi4py "libnetcdf=*=mpi_mpich_*" "esmf=*=mpi_mpich_*" "esmpy=*=mpi_mpich_*" \
   "hdf5=*=mpi_mpich_*"

I see:

$ conda list -n e3sm_diags_env_dev | grep mpi
esmf                      8.0.1           mpi_mpich_h213fab7_100    conda-forge
esmpy                     8.0.1           mpi_mpich_py37hef66020_100    conda-forge
hdf5                      1.10.6          mpi_mpich_ha7d0aea_1    conda-forge
libnetcdf                 4.7.4           mpi_mpich_hfd9c5b6_5    conda-forge
mpi                       1.0                       mpich    conda-forge
mpi4py                    3.0.3            py37h0c5ec45_1    conda-forge
mpich                     3.3.2                hc856adb_0    conda-forge
netcdf-fortran            4.5.3           mpi_mpich_h3923e1a_0    conda-forge

Controlling which MPI variant you get

To explicitly control the build of a given package (nompi, mpich or openmpi), you take advantage of the build string starting with nompi_*, mpi_mpich_* or mpi_openmpi_* (see https://conda-forge.org/docs/maintainer/knowledge_base.html#message-passing-interface-mpi).

As you see above, the default behavior for most packages is to install the nompi version (however, esmf and esmpy favor the mpich version). The "default" version is determined by giving a package a higher build number (say, adding 100 to the build number of other versions). The package solver tries to have the highest possible build number for all packages that passes the constraints from each package.

What to investigate?

The easiest thing to investigate is if maybe openmpi is the problem. To test this, we would want the nompi version of most packages (like you get by default) but instead of mpi4py with openmpi, we would want mpi4py and mpich:

conda create -y -n e3sm_diags_env_dev -c cdat/label/v82 -c conda-forge -c defaults python=3.7 \
  "cdp>=1.6.0" "vcs>=8.2"   "vtk-cdat=8.2.0.8.2" "vcsaddons>=8.2" "dv3d>=8.2" "cdms2>=3.1.4" \
  "cdutil>=8.2" "genutil>=8.2" "cdtime>=3.1.2" numpy matplotlib "cartopy>=0.18.0" beautifulsoup4 \
   lxml mpi4py mpich "libnetcdf=*=nompi_*" "esmf=*=nompi_*" "esmpy=*=nompi_*" "hdf5=*=nompi_*"

This results in:

$ conda list -n e3sm_diags_env_dev | grep mpi
esmf                      8.0.1           nompi_hbeb3ca6_0    conda-forge
esmpy                     8.0.1           nompi_py37h777d1d2_0    conda-forge
hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
libnetcdf                 4.7.4           nompi_h84807e1_105    conda-forge
mpi                       1.0                       mpich    conda-forge
mpi4py                    3.0.3            py37h0c5ec45_1    conda-forge
mpich                     3.3.2                hc856adb_0    conda-forge
netcdf-fortran            4.5.3           nompi_hfef6a68_100    conda-forge

Could you see if that works, or if it produces the same error? If not, we know that the problem is just openmpi vs. mpich.

Longer-term solution

My feeling is that you ultimately want some way of deciding for yourselves if you want cdms2 to be using MPI or not. If you don't explicitly install mpi4py in your dev environment, e3sm_diags will run without MPI as I understand it. cdms2 checks whether to use MPI by checking if mpi4py can be imported: https://github.com/CDAT/cdms/blob/master/Lib/tvariable.py#L26-L32 My feeling is that this is a lazy shorthand and they should be checking if the libraries they actually need are compatible with MPI. If we can figure out which one(s), I'm happy to create an issue for this on their repo.

In e3sm-unified, we need mpi4py for the ilamb package even when we don't want MPI versions of other packages. This has the side effect that cdms2 decided to use MPI whether we want it to or not. One suggestion would be that we request that they add an environment variable that would override the mpi4py check and would disable MPI regardless. This could be set as part of activating the e3sm-unified environment without mpich.

Or, if mpich is working fine (with either mpich or nompi versions of other packages like libnetcdf and esmf), we could explicitly make sure mpich instead of openmpi gets installed with the nompi variant of e3sm-unified in the future.

I don't have experience running e3sm_diags but it seems pretty easy so I would be happy to help with this debugging if you run into trouble.

xylar commented 4 years ago

@chengzhuzhang, if ironing this issue and the other plotting issues you've uncovered requires another "emergency" release of e3sm-unified, that's fine. If that's the case, let's try to make sure we do some thorough testing for the next "emergency" release so it's hopefully the last before next January.

forsyth2 commented 4 years ago

@xylar @chengzhuzhang I created the 4 environments Xylar did and my conda list outputs matched those. I logged onto a Haswell node and activated the 4th environment. python all_sets.py -d all_sets.cfg gives:

/global/homes/f/forsyth/.conda/envs/e3sm_diags_env_6/lib/python3.7/site-packages/unidata/__init__.py:2: UserWarning: unidata package is deprecated please use genutil.udunits instead of unidata.udunits
  warnings.warn("unidata package is deprecated please use genutil.udunits instead of unidata.udunits")
[]
[]
[]
[]
You have no value for ref_names. Caculate test data only
Saved environment yml file to: all_sets_results_test/prov/environment.yml
Saved command used to: all_sets_results_test/prov/cmd_used.txt
Saved cfg file to: all_sets_results_test/prov/all_sets.cfg
Saved Python script to: all_sets_results_test/prov/all_sets.py
Variable: T
Selected pressure level: [200.0]
Plot saved in: all_sets_results_test/zonal_mean_xy/ERA-Interim/ERA-Interim-T-200-ANN-global.png

CDMS system error: No such file or directory
CDMS I/O error: Opening file /global/homes/f/forsyth/.conda/envs/e3sm_diags_env_6/share/e3sm_diags/acme_ne30_ocean_land_mask.nc
Error in acme_diags.driver.zonal_mean_2d_driver
Traceback (most recent call last):
  File "/global/homes/f/forsyth/.local/lib/python3.7/site-packages/acme_diags/driver/zonal_mean_2d_driver.py", line 82, in run_diag
    land_frac = test_data.get_climo_variable('LANDFRAC', season)
  File "/global/homes/f/forsyth/.local/lib/python3.7/site-packages/acme_diags/driver/utils/dataset.py", line 144, in get_climo_variable
    variables = self._get_climo_var(filename, *args, **kwargs)
  File "/global/homes/f/forsyth/.local/lib/python3.7/site-packages/acme_diags/driver/utils/dataset.py", line 337, in _get_climo_var
    raise RuntimeError(msg)
RuntimeError: Variable 'LANDFRAC' was not in the file file:///global/u1/f/forsyth/e3sm_diags/tests/system/T_20161118.beta0.FC5COSP.ne30_ne30.edison_ANN_climo.nc, nor was it defined in the derived variables dictionary.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/global/homes/f/forsyth/.conda/envs/e3sm_diags_env_6/lib/python3.7/site-packages/cdms2/dataset.py", line 1275, in __init__
    _fileobj_ = Cdunif.CdunifFile(path, mode)
OSError: Variable not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/global/homes/f/forsyth/.local/lib/python3.7/site-packages/acme_diags/acme_diags_driver.py", line 275, in run_diag
    single_result = module.run_diag(parameters)
  File "/global/homes/f/forsyth/.local/lib/python3.7/site-packages/acme_diags/driver/zonal_mean_2d_driver.py", line 86, in run_diag
    with cdms2.open(mask_path) as f:
  File "/global/homes/f/forsyth/.conda/envs/e3sm_diags_env_6/lib/python3.7/site-packages/cdms2/dataset.py", line 497, in openDataset
    return CdmsFile(path, mode, mpiBarrier=CdMpi)
  File "/global/homes/f/forsyth/.conda/envs/e3sm_diags_env_6/lib/python3.7/site-packages/cdms2/dataset.py", line 1277, in __init__
    raise CDMSError('Cannot open file %s (%s)' % (path, err))
cdms2.error.CDMSError: Cannot open file /global/homes/f/forsyth/.conda/envs/e3sm_diags_env_6/share/e3sm_diags/acme_ne30_ocean_land_mask.nc (Variable not found)

So, it still fails, but now it's because of a CDMSError.

I also reran with source /global/cfs/cdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified.sh and source /global/cfs/cdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified_mpich.sh. I got the same results (MPI error and success, respectively), as I mentioned in my previous comment.

xylar commented 4 years ago

Okay, so @forsyth2, these errors were with nompi for most packages but for mpich and mpi4py included in the environment, right? It seems like maybe cdms2 isn't happy when you don't have MPI versions of some library or other when you do have mpi4py. It's not really possible for me to figure out by process of elimination which package needs to be MPI because most packages have to match up -- esmf, esmpy, libnetcdf, hdf5, etc. must all either be nompi or all mpich. The error message wasn't very helpful to me. It almost looks like it's saying the file it's trying to open doesn't exist. Can you at least verify that that's not the case?

forsyth2 commented 4 years ago

@xylar The error was produced by the fourth environment you gave (Under "What to investigate?"). Apparently that file doesn't exist -- in fact, the directory /global/homes/f/forsyth/.conda/envs/e3sm_diags_env_6/share/e3sm_diags/ doesn't seem to exist. I'm not sure why CDMS doesn't fail when source /global/cfs/cdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified_mpich.sh is used though.

xylar commented 4 years ago

@forsyth2 and @chengzhuzhang, okay, this seems outside my area of expertise but if I were you, I would investigate further why the expected files aren't being created in the "fourth" environment, the one with mpi4py, mpich and most libraries nompi.

So far, it seems like a feasible solution might be making the mpich, rather than the nompi version of e3sm-unified the default. I'm hesitant to do this, though, because I think I've had trouble actually running MPI jobs using cond-forge mpich on Cori compute nodes in the past and I would be surprised if that has changed. If we do decide to make mpich the default, we would need to do some careful testing of not just E3SM_Diags but any packages that use MPI to make sure they work as expected on compute nodes on all the supported systems.

I still think it's worth investigating, maybe with help form CDAT folks, why things go wrong with an enviornment with mpi4py and mpich unless we have mpich versions of all the packages. For other packages, this should be okay as long as they know not to use MPI. I think this might go back to what I pointed out above, that CDAT tries to figure out whether to use MPI or now by seeing if mpi4py is installed. But The default E3SM-Unified installation tries to have all the libraries nompi and it only includes mpi4py because it's a required dependency of a package called ilamb. Let me know if a video chat next week on this would be helpful.

chengzhuzhang commented 4 years ago

@xylar Thank you,xylar. I finally get more time to also work on this issue. One thing I'm not clear with is that why this issue only be seen on haswell or knl, but not login node? Any insight... Also it seems like running on Compy nodes was not an issue.

xylar commented 4 years ago

@chengzhuzhang, so far, the symptoms point to an incompatibility between conda-forge OpenMPI and Cori compute nodes that doesn't exist for MPICH. I am surprised that MPICH seems to work on Cori nodes so far. That hadn't been my experience in the past. The MPI settings must be different on Compy, such that OpenMPI works on compute nodes. Similarly, Cori's login nodes are a different CPU type and maybe a different version of the OS. Their MPI configuration is also almost certainly different from the compute nodes. Any or all of these could play a role. I don't have the expertise to have a good, concrete explanation, so it points to us needing to test quite a lot more than we have in the past to make sure E3SM_Diags works on all systems on both login and compute nodes.

chengzhuzhang commented 4 years ago

@xylar I did some more investigation. It is almost certain that CDMS or its dependencies caused the problem. The issue can be reproduced, simply by

salloc --nodes=1 --partition=debug --time=00:30:00 -C haswell
source /global/cfs/cdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified.sh
python -c "import cdms2"

Then I tried you instruction to generate dev env that has both cdms2 and mpi4py. The same MPI_init_thread error only occurred when mip_openmpi builts of libnetcdf and other libraries are in the environment. When mpi_mpich builts were forced, the problem is gone.

Then I thought, I should try my luck with the newly released cdat 2.8.1, because i have the impression that it works with the mpi_openmpi version of libnetcdf, but not mpi_mpich variants. Unfortunately, the same error occur on cori haswell, with cdms and openmpi.

I don't know if as you pointed out, https://github.com/CDAT/cdms/blob/master/Lib/tvariable.py#L26-L32. could be the cause.

I used following to generate the env:

conda create -y -n cdms_v821_mpi4py_openmpi_py37 -c cdat/label/v8.2.1 -c conda-forge -c defaults python=3.7 mpi4py cdms2 "libnetcdf=*=mpi_openmpi_*" "esmf=*=mpi_openmpi_*" "esmpy=*=mpi_openmpi_*"

The error can be reproduced using:

salloc --nodes=1 --partition=debug --time=00:30:00 -C haswell
conda activate cdms_v821_mpi4py_openmpi_py37
python -c "import cdms2"

@muryanto1 @jasonb5 I think you have been dealing with compatibility issues of cdms and mpi variants. It would be really appreciated if you could provide some insight with this issue. Thanks!

xylar commented 4 years ago

@chengzhuzhang, that's helpful information. Do recall discussing MPI with @muryanto1 and something about cdms2 not being compatible with MPICH. I think I mentioned that might be a problem for us. My guess is that the issue with OpenMPI isn't necessarily a cdms2 issue in this case. It may be that OpenMPI just doesn't work on Cori login nodes at all, and cdms2 just happens to be importing it.

As I said above, I'm hesitant to make the mpich environment the default, though this may be our best bet in the end. But I could switch from openmpi to mpich in the default environment, with most packages being the nompi variants. Could you and @forsyth2 try to debut the errors in the environment with this setup?

conda create -y -n cdms_v82_mpi4py_nompi_py37 -c conda-forge -c defaults -c cdat/label/v82  python=3.7 mpi4py mpich cdms2 "libnetcdf=*=nompi_*" "esmf=*=nompi_*" "esmpy=*=nompi_*"

chengzhuzhang commented 4 years ago

@xylar with the mpich and nompi variants of the packages, using: conda create -y -n cdms_v82_mpi4py_nompi_py37 -c conda-forge -c defaults -c cdat/label/v82 python=3.7 mpi4py mpich cdms2 "libnetcdf=*=nompi_*" "esmf=*=nompi_*" "esmpy=*=nompi_*" There was no MPI_init_thread error when importing cdms2. And using the same combination to generate e3sm_diags dev env, with conda create -y -n e3sm_diag_mpi4py_nompi_py37 -c cdat/label/v82 -c conda-forge -c defaults python=3.7 "cdp>=1.6.0" "vcs>=8.2" "vtk-cdat=8.2.0.8.2" "vcsaddons>=8.2" "dv3d>=8.2" "cdms2>=3.1.4" "cdutil>=8.2" "genutil>=8.2" "cdtime>=3.1.2" numpy matplotlib "cartopy>=0.18.0" beautifulsoup4 lxml mpi4py mpich "libnetcdf=*=nompi_*" "esmf=*=nompi_*" "esmpy=*=nompi_*" "hdf5=*=nompi_*" "dask=2.15.0"

I had e3sm_diags run sucessfully on haswell.

@forsyth2 I was not able to reproduce the CDMSError you had with this environment. It almost seem like an installation problem. Could you maybe re-install e3sm-diags and try run again?

forsyth2 commented 4 years ago

@chengzhuzhang

cdms_v82_mpi4py_nompi_py37

conda create -y -n cdms_v82_mpi4py_nompi_py37 -c conda-forge -c defaults -c cdat/label/v82  python=3.7 mpi4py mpich cdms2 "libnetcdf=*=nompi_*" "esmf=*=nompi_*" "esmpy=*=nompi_*"
salloc --nodes=1 --partition=debug --time=00:30:00 -C haswell
conda activate cdms_v82_mpi4py_nompi_py37
python -c "import cdms2"

The above does not produce an error.

cd /e3sm_diags/tests/system
conda create -y -n e3sm_diag_mpi4py_nompi_py37 -c cdat/label/v82 -c conda-forge -c defaults python=3.7 "cdp>=1.6.0" "vcs>=8.2" "vtk-cdat=8.2.0.8.2" "vcsaddons>=8.2" "dv3d>=8.2" "cdms2>=3.1.4" "cdutil>=8.2" "genutil>=8.2" "cdtime>=3.1.2" numpy matplotlib "cartopy>=0.18.0" beautifulsoup4 lxml mpi4py mpich "libnetcdf=*=nompi_*" "esmf=*=nompi_*" "esmpy=*=nompi_*" "hdf5=*=nompi_*" "dask=2.15.0"
salloc --nodes=1 --partition=debug --time=00:30:00 -C haswell
conda activate e3sm_diag_mpi4py_nompi_py37
python all_sets.py -d all_sets.cfg

The above produces the CDMS error (cdms2.error.CDMSError: Cannot open file /global/homes/f/forsyth/.conda/envs/e3sm_diag_mpi4py_nompi_py37/share/e3sm_diags/acme_ne30_ocean_land_mask.nc (Variable not found)) again.

CDMS Error

Considering I got the error again when using the steps above and you didn't, it does seem like it's a problem on my end, but I'm not sure what's going on.

Compy

Did you manage to run on Compy? I couldn't load the environment. I ran source /compyfs/software/e3sm-unified/load_latest_e3sm_unified.sh (from https://e3sm-project.github.io/e3sm_diags/docs/html/quickguides/quick-guide-compy.html), but that produces /compyfs/software/e3sm-unified/load_latest_e3sm_unified.sh: No such file or directory. Is there an updated directory for the e3sm_unified script? If so, we should update the docs.

chengzhuzhang commented 4 years ago

@forsyth2 Hey Ryan, since the conda create command only create the development env for e3sm_diags. It needs to be installed using pip install . from the local github repo. And then try run it. Hope this could fix the problem.

Regarding to compy, I saw that Xylar has updated the activation path on compy with /share/apps/E3SM/conda_envs/ https://acme-climate.atlassian.net/wiki/spaces/EIDMG/pages/780271950/Diagnostics+and+Analysis+Quickstart . Would you give it another try and fix our docs accordingly? Thanks!

forsyth2 commented 4 years ago

@chengzhuzhang Thank you! That must have been what was causing the error. It runs successfully now. I completely forgot about the pip install . step; I guess I was still thinking of the unified environment scripts which actually do load E3SM.

salloc --nodes=1 --time=00:30:00
source /share/apps/E3SM/conda_envs/load_latest_e3sm_unified.sh
python all_sets.py -d all_sets.cfg

The above runs successfully on Compy.

forsyth2 commented 4 years ago

Created #330 to update the Compy paths for E3SM Unified.

E3SM-Project / e3sm_diags

MPI_init_thread erorr when running e3sm_diags within e3sm_unified 1.3.1.1 on cori knl and haswell #324

Dev environment

Dev with mpi4py (and openmpi)

Dev environment with all mpich libraries

Controlling which MPI variant you get

What to investigate?

Longer-term solution

cdms_v82_mpi4py_nompi_py37

CDMS Error

Compy