E3SM-Project / e3sm-unified

A metapackage for a unified anaconda environment for analyzing results from the Energy Exascale Earth System Model (E3SM).
BSD 3-Clause "New" or "Revised" License
8 stars 8 forks source link

Update to v1.5.1 ("emergency" release) #96

Closed xylar closed 2 years ago

xylar commented 2 years ago

This merge switches E3SM-Unified deployment to use the new mache package. Information about E3SM supported machines now comes from mache, rather than from config files within the E3SM-Unified repo. This should help to maintain compatibility between E3SM-Unified and other analysis tools.

Since bootstrapping relies on having mache installed in a temporary conda environment, the deployment script has been broken into 3 parts. The deploy_e3sm_unified.py script now first creates the installation conda environment, then calls the bootstrap.py script using that environment. Any functions shared between these two scripts are in shared.py.

A new $E3SMU_MACHINE environment variable has been added that stores the machine name for later identification (e.g. on a compute node).

This merge also switches moab to use MPI and tempest

In addition to mache, the metpy package as also been added. pyflann has been removed, since it is no longer needed for MPAS-Tools and was not being maintained very consistently.

Many other packages have been updated.

xylar commented 2 years ago

@forsyth2, @chengzhuzhang, @tomvothecoder, could you have a look at: https://github.com/E3SM-Project/e3sm-unified/blob/a5707c2b89525429c80010df3246703665e2c3e9/recipes/e3sm-unified/meta.yaml and just let me know if you see anything not at the version you were expecting? I just don't want any surprises about packages I forgot to update or have the wrong version for.

I'm deploying 1.5.1rc1 right now, but can make another build and re-deploy if I messed anything up.

forsyth2 commented 2 years ago

@xylar Thanks

https://github.com/E3SM-Project/e3sm-unified/blob/a5707c2b89525429c80010df3246703665e2c3e9/recipes/e3sm-unified/meta.yaml#L48 should be zppy 1.1.0rc3, not zppy 1.1.0rc1.

chengzhuzhang commented 2 years ago

Thanks for catching the zppy version. Ryan. Other than that, the list looks good to me.

xylar commented 2 years ago

@forsyth2, thanks so much! Very glad I asked.

xylar commented 2 years ago

@chengzhuzhang, this branch should be in good shape to deploy on Acme1 when you have time.

cd e3sm_supported_machines
./deploy_e3sm_unified.py --conda ~/miniconda3 --release

That should do the trick. The --conda path is your own personal miniconda3 installation where it will create a small helper environment that includes mache, needed only for the installation. If you like, you can delete the temporary environment once the installation is done.

chengzhuzhang commented 2 years ago

Sounds good! i will let you know once it is done. Congrats on the new release!

xylar commented 2 years ago

@chengzhuzhang, I realized that I need to make a new release of mache. I thought the PR would happen automatically but it didn't. I'll take care of that now. If you want to wait for that, you can do the installation in an hour. Otherwise, it would be harmless to change mache = 1.1.3 to mache = 1.1.2 in e3sm_supported_machines/defaults.cfg. The only difference between 1.1.2 and 1.1.3 is a fix for Andes.

chengzhuzhang commented 2 years ago

Thanks for the heads-up. I will do the update later~

xylar commented 2 years ago

https://github.com/conda-forge/mache-feedstock/pull/4

xylar commented 2 years ago

I can send you a message when the new mache package is available.

xylar commented 2 years ago

Okay, @chengzhuzhang. Things should be good to go. But no rush. You probably know who used E3SM-Unified on Acme1 and whether they need it anytime soon.

xylar commented 2 years ago

@chengzhuzhang, just let me know when you are done on Acme1 so I can merge this PR. Again, no rush, just trying to tie up loose ends.

chengzhuzhang commented 2 years ago

@xylar hey, there is a glitch deploying on acme1. Could you look at the error message as follows:

-bash-4.2$ ./deploy_e3sm_unified.py --conda ~/minidonda3 --release
Installing Miniconda3
https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
running: /bin/bash Miniconda3-latest-Linux-x86_64.sh -b -p /home/zhang40/minidonda3
PREFIX=/home/zhang40/minidonda3
Unpacking payload ...
/home/zhang40/minidonda3/conda.exe: error while loading shared libraries: libz.so.1: failed to map segment from shared object: Operation not permitted
/home/zhang40/minidonda3/conda.exe: error while loading shared libraries: libz.so.1: failed to map segment from shared object: Operation not permitted
Traceback (most recent call last):
  File "./deploy_e3sm_unified.py", line 117, in <module>
    main()
  File "./deploy_e3sm_unified.py", line 107, in main
    install_miniconda(conda_base, activate_base)
  File "/home/zhang40/e3sm-unified/e3sm_supported_machines/shared.py", line 73, in install_miniconda
    check_call(command)
  File "/home/zhang40/e3sm-unified/e3sm_supported_machines/shared.py", line 50, in check_call
    raise subprocess.CalledProcessError(proc.returncode, commands)
subprocess.CalledProcessError: Command '/bin/bash Miniconda3-latest-Linux-x86_64.sh -b -p /home/zhang40/minidonda3' returned non-zero exit status 1
xylar commented 2 years ago

@chengzhuzhang, can you download and install miniconda3 yourself and see if it works? That's a very early stage to have problems, and probably nothing related to E3SM-Unified.

xylar commented 2 years ago

@chengzhuzhang, there was a typo in the command I sent you minidonda3, rather than miniconda3. I'm not sure that explains the problem but could you try with the typo fixed?

chengzhuzhang commented 2 years ago

Nice catch! version 1.5.1 now can be activated on acme1 through: source /usr/local/e3sm_unified/envs/load_e3sm_unified_1.5.1_acme1.sh

xylar commented 2 years ago

Okay, great! Sorry about that but I'm glad it worked once you fixed the typo.