ESMCI / ccs_config_cesm

CESM CIME Case Control System configuration files
3 stars 44 forks source link

Two issues on greenplanet following ccs_config_cesm split #53

Closed mnlevy1981 closed 2 years ago

mnlevy1981 commented 2 years ago

I'm 99% sure this is a ccs_config_cesm issue, but it might be something in CIME?

Background: Greenplanet is kind of a frankenstein machine, where there are different queues for different groups of nodes with very different properties... so when you are logged in and run create_newcase, the default detected machine is greenplanet-sky24 which uses a queue that only runs on nodes with 40 cores / node, but you can specify --mach greenplanet-sib2.9 to use a group of nodes with 16 cores / node instead.

Problem Description: In cesm2_3_beta07, which uses cime6.0.12 (predating the creation of this repository), I can build and run on either machine. In cesm2_3_beta08, which uses cime6.0.15 and ccs_config_cesm0.0.16, I have the following problems:

  1. I can create a case for greenplanet-sib2.9, but when I try to run ./case.setup I get the error
$ ./case.setup
ERROR: Current machine greenplanet-sky24 does not match case machine greenplanet-sib29.
  1. I can create, setup, and build a case on greenplanet-sky24, but the job aborts before even executing cesm.exe with the error
$ cat run.${CASE}
ERROR: Could not initialize machine object from ${CESMROOT}/ccs_config/machines/config_machines.xml. This machine is not available for the target CIME_MODEL.

For (1), it is recognizing my hostname as being tied to greenplanet-sky24 and as a result it assumes it doesn't have access to greenplanet-sib29. Is there an option I can add to config_machines.xml to make it clear that the greenplanet login nodes have access to both machines?

For (2), I don't know where to begin troubleshooting. For what it's worth, I do have a ~/.cime/config file that sets CIME_MODEL=CESM.

mvertens commented 2 years ago

@mnlevy1981 - could you please try the latest cime and see if that fixes things.

billsacks commented 2 years ago

For the first part, I believe that if you update to a version of CIME that has https://github.com/ESMCI/cime/pull/4228, the problem may be resolved.

jedwards4b commented 2 years ago

@mnlevy1981 I don't have access to greenplanet but I think that this problem may already be resolved. Please try again after updating to the latest ccs_config and cime. I have tried to reproduce the issue on the nersc cori system where there is a similar situation and it seems to work fine.

mvertens commented 2 years ago

@jedwards4b - thanks so much for your quick response to this!

mnlevy1981 commented 2 years ago

Both issues went away when I switched to ccs_config_cesm0.0.36 and cime6.0.33. I didn't realize how out of date the tags in beta08 were, otherwise I would've figured out to try that on my own. Thanks, all!