NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
184 stars 138 forks source link

bug: cam-se state missing on filter write restarts - probably not cam-se problem #669

Closed hkershaw-brown closed 2 months ago

hkershaw-brown commented 2 months ago

:bug: :ant: Your bug may already be reported! Please search on the issue tracker before creating a new issue.

User reported problems with cam-se, where there were missing values _ in the output netcdf from filter. Note the obs_seq.out has NaNs for some of the observation values.

Describe the bug

  1. List the steps someone needs to take to reproduce the bug. Run the cam-se filter test case at /glade/derecho/scratch/hkershaw/DART/Tickets/nuo/run_helen2.state_destroyed/ Note I'm working on a smaller reproducer stay tuned
  2. What was the expected outcome? Successful assimilation
  3. What actually happened?
    filter completes, but various parts of the output state are _ in the output netcdf file.

Unclear at the moment if we have 2 nan problems going on.

Error Message

Please provide any error messages. If you trap on nans: Before preassim state space output TIME: 2024/04/17 10:23:11 forrtl: error (65): floating invalid Image PC Routine Line Source
libpthread-2.31.s 000014B2F25178C0 Unknown Unknown Unknown filter 00000000004433FE quality_control_m 284 quality_control_mod.f90 filter 000000000048DC75 forwardoperator 348 forward_operator_mod.f90 filter 00000000004B8CF6 filter_mod_mp_fil 839 filter_mod.f90 filter 000000000043DC80 MAIN 20 filter.f90 filter 000000000040E9FD Unknown Unknown Unknown libc-2.31.so 000014B2EE3D029D libc_start_main Unknown Unknown filter 000000000040E92A Unknown Unknown Unknown

Which model(s) are you working with?

CAM-SE Possibly this is a wider problem with NaNs propagating from the obs_seq.

Screenshots

If applicable, add screenshots to help explain your problem.

Here is a couple of examples using different cutoffs. Note no adaptive localization.

bullet holes: image

apple core (large cutoff): image

199 &assim_tools_nml 200 cutoff = 3.0 ! HK 3.0 giving more nans 201 sort_obs_inc = .false. 202 spread_restoration = .false. 203 sampling_error_correction = .true. 204 adaptive_localization_threshold = -1 205 output_localization_diagnostics = .false. 206 localization_diagnostics_file = 'localization_diagnostics' 207 convert_all_obs_verticals_first = .true. 208 convert_all_state_verticals_first = .true. 209 print_every_nth_obs = 10000 210 distribute_mean = .false. 211 /

Version of DART

Which version of DART are you using? You can find the version using git describe --tags
v11.4.0

Have you modified the DART code?

No

Build information

Please describe:

  1. NSF NCAR supercomputer Derecho
  2. intel
kdraeder commented 2 months ago

Nuo Chen reports: I ran another test with the real observation from NCEP prepbufr instead of the synthetic ones I made from the CESM truth run (also converted from prepbufr type) in '/glade/derecho/scratch/chennuo/CESM2.2_CAM_ens_test.006', there is no missing values after assimilation, and the model is able to proceed to the second and third cycles.

I will try to recreate the synthetic observation with the perfect_model_obs instead of using the python code inherited from the other member in our group. Glad it's not a bug with the dart camse code itself.

hkershaw-brown commented 2 months ago

Thanks for the update! Pretty funky behavior, it is a cool bug. I'd like to narrow it down at bit at some point - what about that particular obs sequence can generate problems like this? Maybe it is a metadata mismatch rather than the NaNs? NaN + good qc? Funky linked list? 🤷 We've got a good reference in this issue if someone else hits a similar problem with their own generated obs_seqs.

kdraeder commented 2 months ago

I've copied it to /glade/derecho/scratch/raeder/Chen_cam-se/obs_seq.2020-08-05-21600.makes_NaNs, in case you don't have a copy yet. It won't be there forever.
It's 65 Mb, but we could subset it for testing.

hkershaw-brown commented 2 months ago

thanks @kdraeder I grabbed it already, thats what I made the pretty pictures with. The runs (filter only test case) are in /glade/derecho/scratch/hkershaw/DART/Tickets/nuo/run* I'll put the test case on work before its purged off scratch. Fun Saturday afternoon bug hunt for anyone who is interested. Have a good weekend!

Cheers, Helen

on work: /glade/work/hkershaw/DART/nuo/NaN_obs_seq/issue_669

hkershaw-brown commented 2 months ago

closing since this is not a DART bug