NOAA-EMC / RDASApp

Regional DAS
GNU Lesser General Public License v2.1
1 stars 8 forks source link

FYI, unable to build RDASApp in a computing node #165

Open guoqing-noaa opened 1 week ago

guoqing-noaa commented 1 week ago

Some platforms highly prefer compiling using computing nodes and only allow very small build jobs on the login nodes. A computing node will allow using much more processors to compile concurrently so as to speed up the build process.

This issue is just to document the attempt to build RDASApp in a computing node. In many platforms, a computing node differs from the login nodes in that it cannot access the internet.

Previously, RDASApp would download lots of data on the fly during the build process. We have made efforts to reduce that dependency. But apparently, that's not enough.

This is to document the error message I got:

-- Unknown compiler: Intel
-- Configure MPAS for internal ESMF
[ 11%] Creating directories for 'mpas_data-populate'
[ 22%] Performing download step (git clone) for 'mpas_data-populate'
Cloning into 'mpas_data-src'...
fatal: unable to access 'https://github.com/MPAS-Dev/MPAS-Data.git/': Failed to connect to github.com port 443 after 130757 ms: Couldn't connect to server
Cloning into 'mpas_data-src'...
fatal: unable to access 'https://github.com/MPAS-Dev/MPAS-Data.git/': Failed to connect to github.com port 443 after 130735 ms: Couldn't connect to server
Cloning into 'mpas_data-src'...
fatal: unable to access 'https://github.com/MPAS-Dev/MPAS-Data.git/': Failed to connect to github.com port 443 after 130684 ms: Couldn't connect to server
-- Had to git clone more than once: 3 times.
CMake Error at mpas_data-subbuild/mpas_data-populate-prefix/tmp/mpas_data-populate-gitclone.cmake:39 (message):
  Failed to clone repository: 'https://github.com/MPAS-Dev/MPAS-Data.git'
ShunLiu-NOAA commented 1 week ago

Could you specify which platform it is for the this case? We are currently support Orion, Hera and Jet only.

guoqing-noaa commented 1 week ago

Could you specify which platform it is for the this case? We are currently support Orion, Hera and Jet only.

@ShunLiu-NOAA Thanks for the question. I forgot to mention it is on Jet/Hera. And this remind me to test Orion/Hercules. They may have different policies for computing nodes.

guoqing-noaa commented 1 week ago

Computing nodes on Orion/Hercules cannot access internet either

TingLei-NOAA commented 6 days ago

The whole building process could be done in 3 steps :1) cloning(including submodules and configuring ; 2) data cloning/downloading ; 3 a clean "make " step without need to access to internet. . Maybe some extra re-organization is needed to separate step 2 from step 3.

guoqing-noaa commented 5 days ago

@TingLei-NOAA Thanks for the input!

guoqing-noaa commented 5 days ago

build.sh -j6  on jet takes 54m23s to complete build.sh -j10 on jet takes 42m7s to complete build.sh -j16 on jet takes 29m19s to complete build.sh -j20 on jet takes 23m52s to complete build.sh -j30 on jet takes 21m16s to complete

So we should go forward to remove any internet access part in the build process so that we can run the build process in a computing node and it will greatly reduce building time. This will be extremely helpful for Orion,

Timing stats when building mpasjedi only:

./build.sh -m MPAS -j6    #42m52s
./build.sh -m MPAS -j10   #29m53s
./build.sh -m MPAS -j16   #24m41s
./build.sh -m MPAS -j20   #21m54s
./build.sh -m MPAS -j30   #18m39s