NOAA-EMC / JEDI-T2O

JEDI Transition to Operations activities.
GNU Lesser General Public License v2.1
5 stars 4 forks source link

Apply bias correction to aircraft temperature measurements #65

Closed ADCollard closed 1 year ago

ADCollard commented 1 year ago

The temperature bias correction for aircraft measurements is currently missing from UFO.

There is a UFO Issue to address this, but this issue has been created to focus on the reduced problem of how to read in a CSV file with the coefficients and to apply this bias correction to the H(x) (i.e., ignoring for now the larger problem of how to update the coefficients within the minimization).

This issue references JEDI-T2O Issue 60.

ADCollard commented 1 year ago

Confirming that the bias correction in UFO is applied to H(x) and not to the observations:

ObsOperator.cc

// -----------------------------------------------------------------------------

void ObsOperator::simulateObs(const GeoVaLs & gvals, ioda::ObsVector & yy,
                              const ObsBias & biascoeff, ioda::ObsVector & ybias,
                              ObsDiagnostics & ydiags) const {
  oper_->simulateObs(gvals, yy, ydiags);
  if (biascoeff) {
    ObsBiasOperator biasoper(odb_);
    biasoper.computeObsBias(gvals, ybias, biascoeff, ydiags);
    // update H(x) with bias correction
    yy += ybias;
  }
}
BrettHoover-NOAA commented 1 year ago

First crack at trying to untangle this problem, let me know if anything seems wrong about this approach, the UFO code is pretty opaque so a lot of this is conjecture on my part.

Calling back to https://github.com/JCSDA-internal/ufo/issues/2149, the full suite of predictors broken out by tail numbers would look something like this in a set of 3 csv files (this is assuming MetaData/sequenceNumber is the aircraft tail-number):

constant coefficient values:

MetaData/sequenceNumber,BiasCoefficientValue/constantPredictor
int,float
00000000,0.000021
00000674,0.150858
00000727,-0.252591
00000737,0.024460
00000752,-0.438555
00000758,-1.230313
00000765,-0.000105
00000771,0.423237
00000772,0.744413
00000775,-3.377616
...

ascent-rate coefficient values:

MetaData/sequenceNumber,BiasCoefficientValue/ascentPredictor
int,float
00000000,0.000011
00000674,-0.000154
00000727,0.021926
00000737,0.014595
00000752,-0.013559
00000758,0.006147
00000765,-0.000042
00000771,0.018915
00000772,0.014224
000007756,0.128808
...

squared ascent-rate coefficient values:

MetaData/sequenceNumber,BiasCoefficientValue/ascentSquaredPredictor
int,float
00000000,0.000697
00000674,-0.000011
00000727,-0.000494
00000737,0.000408
00000752,-0.000297
00000758,0.001780
00000765,-0.000235
00000771,-0.000030
00000772,0.006476 
00000775,-0.080778
...

From the documentation on DrawValueFromFile here, these values can be drawn each in-turn from YAML using the following template:

- filter: Variable Assignment
  assignments:
  - name: <some-new-variable-name>
    function:
      name: ObsFunction/DrawValueFromFile
      options:
        file: <path-to-input>    # path to the CSV/NetCDF file
        group: BiasCoefficientValue
        interpolation:
        - name: MetaData/sequenceNumber
          method: exact

Note: Documentation for DrawFromFile specifies that there is only one column that can contain retrievable data, while the remaining columns have to be coordinate values used to identify the retrievable value. So I don't believe we can pack all of these into one csv file, although I don't think it would be hard to start with a single csv file containing the tail-numbers and all predictors as separate columns and then split the csv into its component parts prior to running the code.

We could use the DrawFromFile functionality to extract the predictor values from csv for each tail-number in the IODA aircraft data and assign them to variable-names (maybe constantPredictor, ascentPredictor, and ascentSquaredPredictor? the documentation isn't clear on if there needs to be a group-name but if so maybe we can use BiasCoefficientValue/).

Would it be a short lift from this point to performing the bias correction? Can we make three functions in https://github.com/JCSDA-internal/ufo/tree/develop/src/ufo/predictors similar to the satellite-based predictor functions that are already present (or 2 functions and reuse Constant for our constant coefficient) to construct the bias correction? It seems like PredictorBase is generalized enough that we might be able to use these tools to construct our own predictor-generator code?

ADCollard commented 1 year ago

@BrettHoover-NOAA This seems like a reasonable approach to me.

BrettHoover-NOAA commented 1 year ago

Slightly more refined version of this methodology, leaving the door open in step-2 for either applying Predictor functions or generating our own code, or something in-between:

Aircraft Bias Correction:

  1. Generate predictor coefficients using ObsFunction/DrawValueFromFile, store in BiasCoefficientValue/<predictor-name> via Variable Assignment
    
    - filter: Variable Assignment
    assignments:
    - name: BiasCoefficientValue/constantPredictor
    function:
      name: ObsFunction/DrawValueFromFile
      options:
        file: /path/to/aircraft/constant/bias_correction.csv    # path to the CSV file
        group: BiasCoefficientValue
        interpolation:
        - name: MetaData/sequenceNumber
          method: exact

NOTE: This function should take the BiasCoefficientValue/<predictor-name> and multiply it by the <predictor-name> to generate BiasCorrectionPredictor/<predictor-name>, i.e.:

BiasCorrectionPredictor/constantPredictor = ones(nobs) * BiasCoefficientValue/constantPredictor
BiasCorrectionPredictor/ascentPredictor = <ascent> * BiasCoeffcientValue/ascentPredictor
BiasCorrectionPredictor/ascentSquaredPredictor = <ascentSquared> * BiasCoefficientValue/ascentSquaredPredictor
- filter: Variable Assignment
  assignments:
  - name: BiasCorrectionPredictor/constantPredictor
    function:
      name: ???
      …

- filter: Variable Assignment
  assignments:
  - name: BiasCorrectionPredictor/ascentPredictor
    function:
      name: ???
      …

- filter: Variable Assignment
  assignments:
  - name: BiasCorrectionPredictor/ascentSquaredPredictor
    function:
      name: ???
      …
  1. Compute a bias-corrected airTemperature using ObsFunction/LinearCombination, store in DerivedObsValue/airTemperature via Variable Assignment
- filter: Variable Assignment
  assignments:
  - name: DerivedObsValue/airTemperature
    type: float
    function:
      name: ObsFunction/LinearCombination
      options:
        variables: [ObsValue/airTemperature, BiasCorrectionPredictor/constantPredictor, BiasCorrectionPredictor/ascentPredictor, BiasCorrection/ascentSquaredPredictor]
        coefs: [1.0, 1.0, 1.0, 1.0]
BrettHoover-NOAA commented 1 year ago

I have a workflow figured out for the constant predictor, that uses the following sequential steps:

  1. I can take the gdas.tHHz.abias_air file and decompose it into CSV files for each of the predictors using the following script (placed here for posterity):
    
    #!/usr/bin/python3

import csv

csv_write: fill a CSV file with coordinate-variable and draw-variable data for JEDI DrawValueFromFile

#

INPUTS

#

csvFileName: name of a write-allowable file for CSV output, will be generated or clobbered (string)

coefFileName: name of a read-allowable space-delimited GDAS bias correction file (string)

coordVariableName: / of coordinate variable (string)

coordVariableType: variable-type for coordinate variable (string)

coordVariableColumn: column-number for coordinate variable in coefFile (integer)

drawVariableName: / of draw variable (string)

drawVariableType: variable-type for draw variable (string)

drawVariableColumn: column-number for draw variable in coefFile (integer)

#

OUTPUTS

#

no returned variables, but CSV file is properly written and formatted for JEDI DrawValueFromFile

#

DEPENDENCIES

#

csv

# def csv_write(csvFileName, coefFileName, coordVariableName, coordVariableType, coordVariableColumn, drawVariableName, drawVariableType, drawVariableColumn): import csv csvFile = open(csvFileName, 'w') cwrite = csv.writer(csvFile)

write 1st-line header for coordVariableName, drawVariableName

cwrite.writerow([coordVariableName, drawVariableName])
# write 2nd-line header for coordVariableType, drawVariableType
cwrite.writerow([coordVariableType, drawVariableType])
# write first data line for missing value placeholder '_' to set draw for missing coordinates to a
# selected missing value. For aircraft bias correction, setting these coefficient values to 0. will
# effectively skip bias correction for these data.
cwrite.writerow(['_', '0.'])
# write remaining lines from coefFile columns specified by coordVariableColumn, drawVariableColumn
coefFile = open(coefFileName, 'r')
for line in coefFile:
    cwrite.writerow([line.split()[coordVariableColumn], line.split()[drawVariableColumn]])
return
# close files
csvFile.close()
coefFileName.close()

#

begin

# if name == "main":

define GDAS aircraft bias file name for read-only

coefFileName = 'gdas.t00z.abias_air'
# define CSV file names for constant, ascent, and ascent-squared predictor coefficients for write-only (will clobber)
constantFileName = 'constant.csv'
ascentFileName = 'ascent.csv'
ascentSquaredFileName = 'ascentSquared.csv'
# write constant predictor coefficients to c
csv_write(constantFileName, coefFileName,
          'MetaData/stationIdentification',         'string', 0,
          'BiasCoefficientValue/constantPredictor', 'float',  2)
# write ascent predictor coefficients to a
csv_write(ascentFileName, coefFileName,
          'MetaData/stationIdentification',         'string', 0,
          'BiasCoefficientValue/ascentPredictor',   'float',  3)
# write ascentSquared predictor coefficients to a2
csv_write(ascentSquaredFileName, coefFileName,
          'MetaData/stationIdentification',              'string', 0,
          'BiasCoefficientValue/ascentSquaredPredictor', 'float',  4) 

#

end

#

The resulting `constant.csv` file has this format:

MetaData/stationIdentification,BiasCoefficientValue/constantPredictor string,float _,0. 00000000,-0.000001 00000500,0.671394 00000506,0.000000 00000515,-0.315268 00000533,-0.048901 00000545,0.000000 00000551,0.000000 ...

The file is structured for a `DrawValueFromFile` operation to pick a column based on `MetaData/stationIdentification` (i.e. the tail-number) and select the appropriate value of `BiasCoefficientValue/constantPredictor`. The first line after the top 2 header lines contains a missing-value marker: `_,0.`, indicating that any observation with a tail-number not listed in the file should return a `0.` for the coefficient value. This will effectively turn off bias correction for observations with tail-numbers missing from the file.

2. We generate a new variable `BiasCoefficientValue/constantPredictor` for every aircraft temperature observation using the `DrawValueFromFile` function using the `Variable Assignment` "filter":
  1. The BiasCoefficientValue/constantPredictor is then fed to a new obsfunction AircraftBiasCorrectionConstant to produce the predictor's adjustment, which is stored in BiasCorrectionPredictor/constantPredictor using the same Variable Assignment "filter". The difference between BiasCoefficientValue/constantPredictor and BiasCorrectionPredictor/constantPredictor is academic, since the constant predictor coefficient is multiplied by 1. to produce the predictor's adjustment. However, I will also create a companion AircraftBiasCorrectionAscent and AircraftBiasCorrectionAscentSquared obsfunction to produce the predictor adjustment for those predictors, and for those the differences won't be academic. Each function essentially takes the vector of coefficients and vector of predictor values, multiplies them together, and returns the result:

    - filter: Variable Assignment
    filter variables:
    - name: airTemperature
    assignments:
    - name: BiasCorrectionPredictor/constantPredictor
      type: float
      function:
        name: ObsFunction/AircraftBiasCorrectionConstant
    defer to post: true
  2. Finally, these adjustments are added to HofX/airTemperature by using the Variable Assignment "filter" and the LinearCombination obsfunction:

    - filter: Variable Assignment
    filter variables:
    - name: airTemperature
    assignments:
    - name: HofXBc/airTemperature
      type: float
      function:
        name: ObsFunction/LinearCombination
        options:
          variables: [HofX/airTemperature, BiasCorrectionPredictor/constantPredictor]
          coefs: [1.0, 1.0]
    defer to post: true

The end-product is a HofXBc/airTemperature value that is corrected by the constant predictor term only. Since this variable doesn't exist in the diag file, I am spoofing the file by sending this instead to MetaData/windUpward, which does exist in the diag file and is an appropriate float-type, so I can examine it.

When I compare the raw UFO OmB values (with no bias correction) against the GSI's OmB values (with bias correction), the correlation is 0.94. When I substitute the UFO OmB values with the constant predictor's bias correction applied, the correlation rises to 0.96 – that seems pretty good considering that I am currently only applying 1 of 3 predictors and I'm using an out-of-date and incomplete bias correction coefficient file that is missing ~900 tail-numbers.

Once the aircraft ascent-rate is added to the obs file, I can produce the AircraftBiasCorrectionAscent and AircraftBiasCorrectionAscentSquared obsfunctions and pass everything along for full bias correction. I will also need the same gdas.t00z.abias_air file that the GSI uses for its bias correction to fully vet the system.

BrettHoover-NOAA commented 1 year ago

A few updates on this:

1) I discovered that trailing whitespace is important for the tail-numbers to be properly picked by DrawValueFromFile, so I had to modify this line of the CSV generator:

cwrite.writerow([line.split()[coordVariableColumn], line.split()[drawVariableColumn]])

to write the coordinate variable (tail-number string) as a left-justified 8-character string:

cwrite.writerow([line.split()[coordVariableColumn].ljust(8,' '), line.split()[drawVariableColumn]])

Which then allows all of the tail-numbers that are less than 8 characters to be properly assigned values from the CSV file.

2) We also discovered that the bias correction in GSI will dock the ascent rate back to a value of 30. if it is larger than 30., this is done in setupt.f90 Lines 630–642 here:

!          define predictors
           if (aircraft_t_bc) then
              pof_idx = one
              pred(1) = one
              if (abs(data(ivvlc,i))>=30.0_r_kind) then
                 pred(2) = 30.0_r_kind
                 pred(3) = pred(2)*pred(2)
                 data(ier,i) = 1.5_r_kind*data(ier,i)
              else
                 pred(2) = data(ivvlc,i)
                 pred(3) = data(ivvlc,i)*data(ivvlc,i)
              end if
           end if

Note that in this configuration, a negative ascent rate in excess of -30. would also be set to 30., which is probably an error. It is not clear if this ever happens in-practice since descent is typically slower than ascent for air traffic control reasons and a rate of 30. on ascent is already an extreme.

The proper change was made to AirCraftBiasCorrectionAscent and AirCraftBiasCorrectionAscentSquared to account for this change.

3) All three predictors are now operating in a test YAML and applied appropriately to HofX, but we still have outstanding differences between UFO and GSI that need to be explained. 10 of these disagreements are from observations with known tail-numbers, so they should be appropriately assigned predictor coefficients. Interestingly all 10 of the observations are type 130 (AIREP and PIREP as per prepBUFR Table 2 and all of them have an ascent rate of 0., with all 0. values for coefficients (confirmed in the gdas.t18z.abias_air file for the 2021071318 analysis, which is used to drive bias correction for the 2021080100 analysis and updated in-post, and further confirmed by looking at the gdas.t00z.abias_air file for 2021080100 just in-case, it's all zeros in both files). There are also 1232 disagreements among observations with missing tail-numbers, constituting the entire set of observations with missing tail numbers (i.e. any missing tail-number generates a disagreement).

Plotting the missing tail-number HofX differences against the ascent rate shows that there is definitely a correction going on in GSI that includes a squared ascent rate term: image

The observations with a zero ascent rate also have a disagreement that approaches but doesn't quite reach a value of 1., with values of 0.9930115, 0.99302673, or 0.993042 for all 422 obs with missing tail-numbers and an ascent rate of 0.. The 10 disagreements with existing tail-numbers all likewise show a difference of 0.99302673, so they are in-effect behaving as if they were obs with missing tail-numbers.

I think I have solved this mystery: It is mentioned in read_prepbufr.f90 Lines 1539–1546 here that type=130 aircraft reports only contain a flight number, no aircraft type, and are assigned a pseudo tail-number of KX130 instead:

!             Determine if the tail number is included in the taillist
!                special treatment since kx130 has only flight NO. info, no
!                aircraft type info
              if (kx==130) then
                 cc_station_id = 'KX130'
              else
                 cc_station_id = c_station_id
              end if

For whatever reason this is being kept internal to GSI and not propagating to the gdas.tHHz.abias_air file, which apparently still contains the flight-numbers. But there is an entry in the bias correction file for the KX130 pseudo tail-number.

If I include a Variable Assignment filter at the beginning of UFO bias correction to replace all type=130 MetaData/stationIdentification values with 'KX130 ' (including the whitespace), the bias correction appears to work correctly with an HofX value matching GSI's bias corrected value to within a tolerance of +/- 6e-05.

BrettHoover-NOAA commented 1 year ago

Successfully implemented in https://github.com/JCSDA-internal/ufo/pull/2879