geoschem / gchp_legacy

Repository for GEOS-Chem High Performance: software that enables running GEOS-Chem on a cubed-sphere grid with MPI parallelization.
http://wiki.geos-chem.org/GEOS-Chem_HP
Other
7 stars 13 forks source link

[BUG/ISSUE] Large memory requirement at compile time prevents automated build #4

Closed JiaweiZhuang closed 5 years ago

JiaweiZhuang commented 5 years ago

I am able to build GCHP Docker image on a large EC2 instance (>10 GB RAM), but fail to do so with automated build on Docker Hub because of the 2 GB RAM restrictions on Docker Hub

Here's the full build log: https://hub.docker.com/r/zhuangjw/gchp_model/builds/b4bvaupogcmwvy5dcc9nzdw/

Any idea why GCHP needs so large memory at compile time?

The workaround is to build Docker images locally (e.g. on AWS) and uploaded to Docker Hub.

Alternatively I can try building Docker images on TravisCI. Travis has 7.5 GB RAM and should probably work.

lizziel commented 5 years ago

Nothing comes to mind for why the memory requirement is so large. Have you tried to isolate if it is from compilation of ESMF or MAPL specifically? If it is MAPL, this is something we could potentially bring up with GMAO.

JiaweiZhuang commented 5 years ago

Looks like it crashes even when compiling ESMF (haven't got to MAPL yet):

mpicxx -c -fPIC -O -DNDEBUG -fPIC -DESMF_LOWERCASE_SINGLEUNDERSCORE -m64 -mcmodel=small -pthread -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/Mesh/src -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/Mesh/src/../include  -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/build_config/Linux.gfortran.default -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Superstructure -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/stubs/pthread  -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/include   -DMPICH_IGNORE_CXX_SEEK -DESMF_NO_INTEGER_1_BYTE -DESMF_NO_INTEGER_2_BYTE -DESMF_MPIIO -DESMF_NO_OPENMP -DSx86_64_small=1 -DESMF_OS_Linux=1 -D__SDIR__='"src/Infrastructure/Mesh/src"' /tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/Mesh/src/ESMCI_FindPnts.C -o /tutorial/gchp_standard/CodeDir/GCHP/ESMF/obj/objO/Linux.gfortran.64.mpich2.default/ESMCI_FindPnts.o

mpicxx -c -fPIC -O -DNDEBUG -fPIC -DESMF_LOWERCASE_SINGLEUNDERSCORE -m64 -mcmodel=small -pthread -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/Mesh/src -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/Mesh/src/../include  -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/build_config/Linux.gfortran.default -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Superstructure -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/stubs/pthread  -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/include   -DMPICH_IGNORE_CXX_SEEK -DESMF_NO_INTEGER_1_BYTE -DESMF_NO_INTEGER_2_BYTE -DESMF_MPIIO -DESMF_NO_OPENMP -DSx86_64_small=1 -DESMF_OS_Linux=1 -D__SDIR__='"src/Infrastructure/Mesh/src"' /tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/Mesh/src/ESMCI_ConserveInterp.C -o /tutorial/gchp_standard/CodeDir/GCHP/ESMF/obj/objO/Linux.gfortran.64.mpich2.default/ESMCI_ConserveInterp.o

mpicxx -c -fPIC -O -DNDEBUG -fPIC -DESMF_LOWERCASE_SINGLEUNDERSCORE -m64 -mcmodel=small -pthread -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/Mesh/src -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/Mesh/src/../include  -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/build_config/Linux.gfortran.default -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Superstructure -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/stubs/pthread  -I/tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/include   -DMPICH_IGNORE_CXX_SEEK -DESMF_NO_INTEGER_1_BYTE -DESMF_NO_INTEGER_2_BYTE -DESMF_MPIIO -DESMF_NO_OPENMP -DSx86_64_small=1 -DESMF_OS_Linux=1 -D__SDIR__='"src/Infrastructure/Mesh/src"' /tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Infrastructure/Mesh/src/ESMCI_MeshCXX.C -o /tutorial/gchp_standard/CodeDir/GCHP/ESMF/obj/objO/Linux.gfortran.64.mpich2.default/ESMCI_MeshCXX.o

g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.

gmake[14]: *** [/tutorial/gchp_standard/CodeDir/GCHP/ESMF/obj/objO/Linux.gfortran.64.mpich2.default/ESMCI_HAdapt.o] Error 4
gmake[14]: *** Waiting for unfinished jobs....

g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
lizziel commented 5 years ago

We want to eventually use a pre-built ESMF which would solve this problem if the large RAM requirement is only from building ESMF. You can isolate the RAM needed to build everything except ESMF by compiling ESMF successfully so that you have file esmf.install) and then doing make compile_mapl.

JiaweiZhuang commented 5 years ago

Ha, it turns out that make compile_mapl only needs 2GB RAM. Compiling ESMF needs 8GB RAM:

Full log files, tested on EC2 t2.micro, t2.small, t2.large:

So this problem should be solved by using pre-built ESMF.

lizziel commented 5 years ago

Great. I will close this issue since we have a path forward, although it might take some time.