NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
184 stars 138 forks source link

bug: Updates to snow mass and snow height resulting from snow repartitioning in dart_to_clm.f90 are causing CLM to generate NaN fluxes to the coupler and NaN state variables. #659

Open XueliHuo opened 3 months ago

XueliHuo commented 3 months ago

:bug: Your bug may already be reported! Please search on the issue tracker before creating a new issue.

Describe the bug

I am using the snow repartition in dart_to_clm.f90 for my snow data assimilation project. The dart_to_clm.f90 used in my project has the latest commit 8c75bd38da3ff71175b1452d8d79bae89b725ebf At the assimilation step, the snow observation was successfully assimilated into CLM, and the snow mass (both liquid and ice components) and snow thickness in the restart files were updated with no error. However, when CLM continued running with these restart files where the snow mass and snow thickness were updated, it stopped with an error.

Error Message

clm: completed timestep         1489  # of NaNs =            7  Which are NaNs =  F F F F F F T F F T F F F F F F F F F F F F F F F F F F F F F  F F F F F T T T T F F F F T F F F F F F F F F F F F F F F F F F F F F F F F  Sl_t  Sl_snowh  Fall_lat  Fall_sen  Fall_lwup  Fall_evap  Flrl_rofsur  gridcell index =           77  local  gridcell index =           77  ENDRUN:  ERROR:   lnd_export                       ERROR: One or more of the output from CLM to the coupler are NaN

Which model(s) are you working with?

clm with the version as release-cesm2.2.01

Screenshots

If applicable, add screenshots to help explain your problem.

Version of DART

Which version of DART are you using? You can find the version using git describe --tags
v10.7.0-111-g70e6af803

Have you modified the DART code? No

If your code changes are available on GitHub, please provide the repository.

Build information

Please describe:

  1. The machine you are running on (e.g. windows laptop, NSF NCAR supercomputer Derecho).
    I am using the HPC provided by University of Utah.
  2. The compiler you are using (e.g. gnu, intel).
    intel
XueliHuo commented 3 months ago

I am working on this issue and making some changes to the snow partition in the dart_to_clm.f90. These new changes are being tested in a snow data assimilation experiment which is undergoing. I will provide more information later of these changes if they work out.

braczka commented 3 months ago

This is perfect @XueliHuo. Thank you for posting and updating us. We look forward to the PR at some point when the changes have been fully vetted.

hkershaw-brown commented 3 months ago

is this related to https://github.com/NCAR/DART/issues/254#issuecomment-872262034?

braczka commented 3 months ago

254 (comment)

I don' t think so. It's most closely related to previous snow layer NaN issues fixed in PR #606. This is an additional issue that goes beyond the external clamping solutions (i.e. physical snow layer quantities >=0 etc) posed in that previous issue/fix. Because this is new science, @XueliHuo will update with PR at her own discretion.

hkershaw-brown commented 3 months ago

@hkershaw-brown need test cases for this code.

hkershaw-brown commented 3 months ago

254 (comment)

I don' t think so. It's most closely related to previous snow layer NaN issues fixed in PR #606. This is an additional issue that goes beyond the external clamping solutions (i.e. physical snow layer quantities >=0 etc) posed in that previous issue/fix. Because this is new science, @XueliHuo will update with PR at her own discretion.

ok sounds good. It would be good to get a description of the code that is causing problem in dart_to_clm.f90 once the bug is identified (so what the code is doing vs. what the code should be doing), and a reproducer so we can test before and after a fix (and add to our tests).