Closed fossell closed 2 years ago
SUmmary of system and version specs:
Kate's specs: Mojave 10.14.6 and Docker Desktop 3.5.2. - FAIL Jamie's specs: Catalina and Docker Desktop 3.5.2. - FAIL Michelle's specs: Big Sur and Docker Desktop 3.5.1 - SUCCESS When Michelle upgraded her Docker Desktop to v3.5.2, wps_wrf build failed. (All used Docker Engine v20.10.7)
ISSUE: Upgrading to Docker Desktop v3.5.2. Need to submit a issue to Docker repo.
Also fails with same error on Big Sur and Docker Desktop v3.3.3 and Docker Engine 20.10.6.
Recent tests indicate this could just be a memory issue. Increasing the memory to at least 10gb has proven successful for a number of tests and retests of main branch and other feature branches. More testing to confirm.
All team members appear to have repeating successful builds of the wps_wrf image on MacOS when increasing the memory to at least 10GB. We assume this was the issue and is no resolved. Closing this issue.
When building the wps_wrf container, the following error occurs which is basically just the list command of the executables in the Dockerfile to check if they were built successfully. Since that command fails, we know the executables aren't built and then the image isn't built either because of that failure. (Note, this is a fresh clone of the main branch of the repo, no mods, no changes, just top of the repo).
=> ERROR [ 7/12] RUN ls /comsoftware/wrf/WRF-4.1.3/main/real.exe /comsoftware/wrf/WRF-4.1.3/main/wrf.exe 0.3s
I added a redirect ( |tee >& /comsoftware/wrf/log.out ) into the Dockerfile so that the docker build wouldn't fail on that RUN command and I could then use the image to bin/bash into the container and see what was going on. Interestingly, when I do this, it seems the wrf main libraries are built so when the Dockerfile moves on to build WPS, it does build WPS executable successfully. Anyway, while bin/bash-ed in the container I can see the first error in the compile log for wrf is:
time mpif90 -o nl_get_0_routines.o -c -O0 -w -ffree-form -ffree-line-length-none -fconvert=big-endian -frecord-marker=4 -I../dyn_em -I../dyn_nmm -I/comsoftware/wrf/WRF-4.1.3/external/esmf_time_f90 -I/comsoftware/wrf/WRF-4.1.3/main -I/comsoftware/wrf/WRF-4.1.3/external/io_netcdf -I/comsoftware/wrf/WRF-4.1.3/external/io_int -I/comsoftware/wrf/WRF-4.1.3/frame -I/comsoftware/wrf/WRF-4.1.3/share -I/comsoftware/wrf/WRF-4.1.3/phys -I/comsoftware/wrf/WRF-4.1.3/wrftladj -I/comsoftware/wrf/WRF-4.1.3/chem -I/comsoftware/wrf/WRF-4.1.3/inc -I/comsoftware/libs/netcdf/include yy0.f90 gfortran: fatal error: Killed signal terminated program f951 compilation terminated.
I tried to compile wrf manually inside the container and it successfully built the executables. So inside container is successful, outside container with docker build is unsuccessful. I can also build the upp image successfully, so seems to be just wps_wrf issue (Jamie reports all other containers build successfully).
I have also done a complete clean from scratch attempt, e.g. wiping out my qcow, doing a docker system prune, removing all images and layers and everything, fresh git clone of the repo, etc., and same behavior. I checked my disk space and plenty of space there so that wasn't the issue.