NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
187 stars 140 forks source link

bug: obs_seq_coverage does not recognize all QC metadata generated by obs converters #428

Open kdraeder opened 1 year ago

kdraeder commented 1 year ago

Describe the bug

  1. Create or find an obs_seq file which has a platform QC metadata which is not found in the hard-coded list in subroutine find_our_copies (Quality Control, NCEP QC index, DART quality control) (see attached; remove the TXT, then tar -z -x -f).
  2. Feed it to obs_seq_coverage.
  3. Expect a file with a list of locations where repeated obs are taken.
  4. Program fails (see attached).

Error Message

find_qc_indices metadata:source Quality Control copyindex not found

Which model(s) are you working with? WACCM-X, but that's not relevant

Version of DART

Which version of DART are you using? Manhattan, circa 2022-10 and older

Have you modified the DART code?

No A short term fix is to add "COSMIC QC" to the series of "if" statements. But a more general possibility is to define a namelist variable that contains a list of permissible QC metadata. Then any user can provide new values without debugging or asking us, plus modifying the code and recompiling. This is an issue for other programs too, but I have not identified all of them yet.

Build information

Please describe:

  1. The machine you are running on (e.g. windows laptop, NCAR supercomputer Cheyenne). Any
  2. The compiler you are using (e.g. gnu, intel). Any obs_seq_coverage_io.tgz.TXT
hkershaw-brown commented 1 year ago

Trying to reproduce the error, the qc not found is just a message:

  obs_seq_coverage  opening obs_seq.ion.2019-01-02
  find_qc_indices  metadata:source Quality Control copyindex not found
  find_qc_indices  metadata:DART   Quality Control copyindex not found
 QC index          -1  observation
 QC index          -1  observation

The error is because no voxels are found:

First observation time day=152672, sec=2222
First observation date 2019 Jan 02 00:37:02

 There were            0  voxels matching the input criterion.
 There were            0  voxels*times matching the input criterion.
 ERROR FROM:
  source : obs_seq_coverage.f90
  routine: obs_seq_coverage
  message:  No location had at least            9  reporting times.

No voxels are found, because none of the observations is this case are passing the 'vertically desired' test.
462 if ( .not. vertically_desired(obs_loc, ilev) ) cycle ObservationLoop

So I think the program is behaving as expected.

hkershaw-brown commented 1 year ago

If you compile with debugging flags and run with verbose=.true. and you have no dart qc or NCEP/QC then these lines with seg fault.

1653 if (verbose) then
1654    write(*,*)'QC index',     qcindex,' ',trim(get_qc_meta_data(myseq,     qcindex))
1655    write(*,*)'QC index',dart_qcindex,' ',trim(get_qc_meta_data(myseq,dart_qcindex))
1656    write(*,*)
nancycollins commented 1 year ago

when someone fixes this bug i'd suggest changing 1 of those 2 text labels ('QC index') to be less confusing. 'qcindex' is the copy number for the incoming data QC (0=good, bigger numbers mean bad obs quality). 'dart_qcindex' is the copy number for the dart flag we add to say what happened to this obs during assimilation (e.g. 0=assimilated, 4=failed FO, 7=failed outlier test, etc). with these labels the two are conflated. (edit: maybe something like the message from find_qc_indices: source QC and DART QC)

hkershaw-brown commented 1 year ago

@nancycollins just to clarify here, when you say fixing this bug, you mean fixing the segfault with verbose=.true. or is there more that needs to be fixed?

nancycollins commented 1 year ago

yes, i mean just this verbose message not looking for a -1 index before using it. the rest of the program seems like it's working fine. it is definitely not the program this user should be running on GPS obs. it's intended for obs that come from fixed locations at repeated times, to filter through stations which have observations at a high enough percentage of the total possible times to give good statistics in further analysis.

kdraeder commented 1 year ago

I agree with @nancycollins about clearer variable names and values and about the user's misguided choice of using obs_seq_coverage. But there are other programs that handle QC the same way, and would be candidates for upgrading, maybe to the synonymous_qc_list mechanism used by

These files share the QC issue:

And maybe

nancycollins commented 1 year ago

we could make the documentation for these tools say we expect the incoming data QC to be named 'Quality Control'. if it's not, use the obs_sequence_tool to rename it or if the file is in ascii, just edit the name. we could encourage obs converters to use this metadata name as the data source QC. i'm not sure why we didn't do this originally, to be honest. we already require the metadata for the obs value to be 'observation' and don't allow anything else.

my vote is leave this code as-is for now, and when people are working on obs converters for other reasons change them to use 'Quality Control' as the metadata name. more importantly, ensure they're using a consistent numeric value for quality. we've adopted the NCEP prepbufr conventions for this - 0 is good, and increasing numbers are more and more questionable quality. there's a table of meanings for the various values someplace in the prepbufr docs.

perhaps the original thought was to support more than one quality control numeric scheme, but that introduces a lot of other complications when trying to assimilate obs and discard ones with bad incoming quality control. at this point i think we're wedded to the ncep scheme and so being flexible about supporting multiple metadata names is less important.

nancycollins commented 1 year ago

if someone did want to support different QC schemes in the future - perhaps when rewriting the observation file format and code - there could be a single set of subroutines for querying copies by function. e.g. call return_me_the_QC_copy(seq, i) without having to specify a metadata name. another subroutine asking if a QC is good or not. that centralizes the QC support in a single location. then if someone wanted to support different QC schemes (e.g. 100 is good quality, smaller numbers bad - i think this is the little-r convention?) that could be added in one location instead of the list of replicated code that kevin discovered.