ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
308 stars 312 forks source link

Two FATES tests that fail #1129

Open ekluzek opened 4 years ago

ekluzek commented 4 years ago

Brief summary of bug

ERP_Ld9.f45_f45_mg37.I2000Clm45Fates.izumi_nag.clm-FatesAllVars SMS_Lm6.f45_f45_mg37.I2000Clm45Fates.cheyenne_intel.clm-Fates

General bug information

CTSM version you are using: ctsm1.0.dev108-2-ga69b1b21 (release-cesm2.2 branch before first tag)

Does this bug cause significantly incorrect results in the model's science? N Configurations affected: FATES

Details of bug

The AllVars test asks for some variables that don't exist. So the list in the user_nl_clm just needs to be shortened.

The other test fails because finidat is being set to the $DIN_LOC_ROOT directory, because there is an empty entry for fates. The empty entry should be removed.

glemieux commented 4 years ago

@ekluzek I'm seeing this error on a simple f45_f45_mg37 grid case run on cheyenne. What prompted the original removal? Was this commit brought over to main?

ekluzek commented 4 years ago

@glemieux I'm not exactly sure what you are asking. The setting for finidat was making it get set to a cold-start, but that contradicted some of the other settings so it wouldn't work.

And no, this hasn't been brought over to main yet, but needs too be.

glemieux commented 4 years ago

Sorry @ekluzek I was referring to the original commit that removed the filename from the finidat definition: https://github.com/ESCOMP/ctsm/commit/73c1bae5b7c2c335347526afad4663ef954c0c43#diff-189bd11300fa533cd56fbd01e1204a93L609-L610

I don't quite understand how the finidat definition works; if you define something, it's supposed to be telling the hlm to use a specific file that meets the given case setup, correct? But if you remove that file, does it default back to some generic file?

ekluzek commented 4 years ago

@glemieux we should perhaps chat about this offline. But, the reason I had to remove that file was that it no longer worked because of the fsurdat file update. A problem with FATES is that we've got to have exact matches and can't use the interpolation. So we either have to constantly update finidat files because of changes that demand a new one -- or do a coldstart. What I did was to just remove the filename, but I should have removed the whole thing.

For fates if it can't find a file it'll default to a coldstart. For other configurations it can usually find some file that will match.

glemieux commented 4 years ago

Thanks @ekluzek that helped fill a gap in my knowledge.

Chatting offline, maybe at the next ctsm software meeting, about when to bring the fix in sounds good.

ekluzek commented 3 years ago

The issue was that these tests fail with the NAG compiler when DEBUG is not TRUE. So we changed our tests so that they are all DEBUG on for izumi_nag.

ekluzek commented 3 years ago

This came up again because this test was not working for @fischer-ncar

ERS_Ld5_Mmpi-serial.1x1_vancouverCAN.I1PtClm45SpRs.izumi_nag.clm-default

We need to remove it from the test list or make it with DEBUG on.

ekluzek commented 3 years ago

The error we see looks like this...

Warning: /scratch/cluster/erik/ctsm5.1.dev026/src/fates/parteh/PRTAllometricCNPMod.F90, line 2280: H_ALLOM explicitly imported into PRTALLOMETRICCNPMOD but not used Warning: /scratch/cluster/erik/ctsm5.1.dev026/src/fates/parteh/PRTAllometricCNPMod.F90, line 2280: LEAVES_ON explicitly imported into PRTALLOMETRICCNPMOD but not used Extension: /scratch/cluster/erik/ctsm5.1.dev026/src/fates/parteh/PRTAllometricCNPMod.F90, line 1665: Argument A2 (no. 2) data type DOUBLE PRECISION inconsistent with previous argument with data type REAL in reference to intrinsic MAX Panic: /scratch/cluster/erik/ctsm5.1.dev026/src/fates/parteh/PRTAllometricCNPMod.F90: Unexpected expr node type 432 Internal Error -- please report this bug

ekluzek commented 3 years ago

Note a related error that a user is getting with an old GNU compiler is here:

https://bb.cgd.ucar.edu/cesm/threads/error-build-fail-clm-buildlib-failed-clm5-0-porting.5509/#post-39508

billsacks commented 3 years ago

I looked briefly into the error reported in the forum post, which points to this line, if I understand correctly:

     ! Have the global generic pointer, point to this hypothesis' object
     prt_global => prt_global_acnp

This does look suspicious to me: prt_global is declared as:

  type(prt_global_type),pointer,public :: prt_global

whereas prt_global_acnp is declared as:

   class(prt_global_type), public, target, allocatable :: prt_global_acnp

Note the type vs. class. Class is more general than type, so I'm not surprised that some compilers don't like having you set the specific prt_global point to the more general prt_global_acnp: what would happen if this prt_global_acnp instance were actually some subclass? (I'm actually surprised that any compiler allows this.)

I think this should be changed in one of 3 ways:

  1. Declare prt_global to be class(prt_global_type) instead of type
  2. Declare prt_global_acnp to be type(prt_global_type) instead of class
  3. (probably not the right solution here, but possible) Wrap the pointer assignment in a "select type" block, only doing the the assignment if prt_global_acnp satisfies "type is prt_global_type", aborting in any other case.
rgknox commented 3 years ago

Thanks for looking into this everyone. I'm leaning towards option 1 @billsacks

ekluzek commented 2 years ago

OK, this error is still happening in ctsm5.1.dev064. So the proposed fix didn't completely fix it. We've gotten around it by only running NAG tests on izumi with DEBUG on. This is a problem for CAM test lists though. We keep talking about this in CSEG meetings and aren't sure what to do.

I'm reopening to raise awareness...

ekluzek commented 1 year ago

I've put our workaround of making sure on izumi_nag only DEBUG tests are done on the release-cesm2.2 branch. So that clears the cesm2.2 milestone for this issue.

The issue itself still exists on the main dev branch though...