NOAA-GFDL / CM4

8 stars 10 forks source link

where to find the definition of total cores 'npes' used for couple run #13

Open miniufo opened 3 years ago

miniufo commented 3 years ago

Hi sir, I have completed the compilation of the CM4 on our HPCC. When I try to run the model in the concurrent way, error of coupler_init: atmos_npes+ocean_npes must equal npes for concurrent coupling is raised. I notice the inconsistency of the README and CM4_run_script.sh. I would like to know:

  1. where to find the definition of total cores 'npes' in the input.nml?
  2. can I change the number of cores (defined here) used for each component (e.g., atmospheric or oceanic submodel)?

Thank you very much for your help.

thomas-robinson commented 3 years ago
  1. The npes are specified in the coupler_nml namelist. There are variables for ocean_npes, atmos_npes, and atmos_nthreads.The total number of processors you are running on is atmos_npes*atmos_nthreads + ocean_npes. That's how many processors you need to request when running your job.
  2. If you change the number of cores/threads in your run script, you have to change them in your input.nml also. If you do change it, you have to adjust your layouts in the namelist as well. They are variables in the namelist that look like layout = x,y.

For atmosphere and land: x y 6 = atmos_npes

For ocean and ice: x * y = ocean_npes

2.5 If you change the number of ocean ranks, you will need to consider that the ocean mask file may not work for you. You might have to consult with the MOM team to see if they can help you with that.

miniufo commented 3 years ago

Thank you very much @thomas-robinson but I cannot find coupler_nml. And what is the file name of the namelist for atmosphere component?

thomas-robinson commented 3 years ago

coupler_nml is in the input.nml file. Have you downloaded the data tarfile?

miniufo commented 3 years ago

Yes, I downloaded all. I though that coupler_nml is another namelist file. From the example input.nml in the tarball, which one in the following is the npes? image

thomas-robinson commented 3 years ago

The npes this is set up for is (864*2)+4671 = 6399 . You need 6399 processors.

On Sat, Jan 30, 2021 at 9:54 AM Yu-Kun Qian notifications@github.com wrote:

Yes, I downloaded all. I though that coupler_nml is another namelist file. From the example input.nml in the tarball, which one in the following is the npes? [image: image] https://user-images.githubusercontent.com/9312831/106359459-e88b3180-634d-11eb-85fe-0f5b2703c995.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-GFDL/CM4/issues/13#issuecomment-770223771, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7IVQQ635KS5TCFM4VWL4DS4QMQ5ANCNFSM4WYOQCPQ .

-- Tom Robinson (he/him) Model Systems Division - Geophysical Fluid Dynamics Laboratory

miniufo commented 3 years ago

Thanks @thomas-robinson , but I want to modified the total npes as our computer does not have so many resources (cores). Is it possible to change npes so that the error of coupler_init: atmos_npes+ocean_npes must equal npes for concurrent coupling is not raised

thomas-robinson commented 3 years ago

The variables ocean_npes, atmos_npes, and atmos_nthreads combine to set the total npes.The total number of processors (npes) you are running on is atmos_npes*atmos_nthreads + ocean_npes. That's how many processors you need to request when running your job. If you want to change the number of processors, you need to change the values of these variables. atmos_npes has to be a multiple of 6.

If you change these variables, then you need to change the layouts in your input.nml also. They are variables in the namelists that look like layout = x,y.

For atmosphere and land: x y 6 = atmos_npes

For ocean and ice: x * y = ocean_npes

If you change the number of ocean ranks, you will need to consider that the ocean mask file may not work for you. You might have to consult with the MOM6 documentation to see how to do that.

miniufo commented 3 years ago

OK, I think I understand it much better now. There is no standalone npes, but determined by ocean_npes, atmos_npes, and atmos_nthreads. I'll look into that error again.