JCSDA-internal / eva

Evaluation and Verification of the Analysis
Apache License 2.0
5 stars 12 forks source link

Adding capability to compare geovals #177

Closed asewnath closed 6 months ago

asewnath commented 9 months ago

We want to add the capability for comparing geovals from different systems (jedi, gsi, geos, etc.) This involves adding a new dataset reader and potentially a transform. The reader would require an obs file along with the geoval file to retrieve lat/lon information. The reader would also take in templated filenames so that it may read more than one instrument file at a time.

The new transform takes the lat/lon information from experiment and control, finds a list of indices from control that are the closest match to experiment, and then updates the experiment dataset with variables from the control dataset that are index matched to it. The new fields in the experiment dataset would look something like this: experiment_geovals::amsua_n19_from_control_geovals::vegetation_area_fraction

Potential eva configs for geoval space:

datasets:
  - name: experiment_geovals
    type: GeovalSpace
    obs_file:
      - ${data_experiment_path}/{instrument}_experiment.nc4
    geovals_file:
      - ${data_experiment_path}/{instrument}_experiment_geovals.nc4
    levels: &levels 33
    instruments:
      - name: amsua_n19
        geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
      - name: avhrr3_metop-b

  - name: control_geovals
    type: GeovalSpace
    obs_file:
      - ${data_control_path}/{instrument}_control.nc4
    geovals_file:
      - ${data_control_path}/{instrument}_control_geovals.nc4
    levels: &levels 33
    instruments:
      - name: amsua_n19
        geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
      - name: avhrr3_metop-b

  transforms:
    - transform: index_match
      starting_dataset: control_geovals
      match_index_to_this_dataset: experiment_geovals

@CoryMartin-NOAA Please let me know if you have any thoughts or suggestions for this new reader/transform. I had also thought to combine control and experiment into one dataset read and perform the index matching there so that there's no need for a new transform

CoryMartin-NOAA commented 9 months ago

@asewnath I think the transform is a necessary thing. I know @weihuang-jedi was looking for something like this.

Beyond geovals, I think the new transform could be useful for two IODA obs spaces. Say you have two experiments of PE counts, so the distributions may be different, but its the same data, so we could re-index to plot. This would also be good for independent GSI vs JEDI h(x) comparisons.

For the new dataset reader, can we make it more generic than geovals? Like something like 'data file' and 'coordinate file' or something like that? This is analogous to how the FV3 RESTART files have data in one file, but the lat/lon info is in another.

asewnath commented 9 months ago

Thanks for the guidance @CoryMartin-NOAA. Given what you have suggested, I've modified the following proposed config file for an example of reading two sources of geoval files and the new transformer

datasets:
  - name: experiment_geovals
    type: DataFile
    data_file:
      - ${data_experiment_path}/{instrument}_experiment_geovals.nc4
    levels: &levels 33
    instruments:
      - name: amsua_n19
        geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
      - name: avhrr3_metop-b

  - name: control_geovals
    type: DataFile
    data_file:
      - ${data_control_path}/{instrument}_control_geovals.nc4
    levels: &levels 33
    instruments:
      - name: amsua_n19
        geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
      - name: avhrr3_metop-b

  - name: experiment_lat_lon
    group: state
    type: LatLon
    filename: ${data_input_path}/{instrument}_experiment.nc4
    variables: [lat, lon]

  - name: control_lat_lon
    group: state
    type: LatLon
    filename: ${data_input_path}/{instrument}_control.nc4
    variables: [lat, lon]

  transforms:
    - transform: index_match
      dataset_1: control_geovals
      lat_lon_1: control_lat_lon
      dataset_2: experiment_geovals
      lat_lon_2: experiment_lat_lon

I'll iterate on what makes the most sense for the transform config. Also, for the transform, lat_lon_1, lat_lon_2 would be optional arguments (case where IodaObsSpace datasets are used, etc)

CoryMartin-NOAA commented 9 months ago

Looks good, thanks @asewnath