Too many invalid lines for curuc screening

gerritholl commented 5 years ago

As reported by hirs_orbit_map which calls (essentially) hirs_curuc_checker, it seems in very many cases there are very many invalid lines reported. It is always well over 50%, in many cases over 90%:

/group_workspaces/cems2/fiduceo/Data/FCDR/HIRS/v0.8pre/easy/noaa15/2004/08/09/FIDUCEO_FCDR_L1C_HIRS3_noaa15_20040809113229_20040809131329_easy_v0.8pre_fv0.7.nc

WARNING  FCDR_HIRS.metrology 2018-11-06 22:24:50,524 metrology.calc_corr_scale_channel:1120: In correlation calculation, 11/12 lines invalid

or

/group_workspaces/cems2/fiduceo/Data/FCDR/HIRS/v0.8pre/easy/noaa17/2010/11/28/FIDUCEO_FCDR_L1C_HIRS3_noaa17_20101128074613_20101128092720_easy_v0.8pre_fv0.7.nc

WARNING  FCDR_HIRS.metrology 2018-11-06 22:23:31,811 metrology.calc_corr_scale_channel:1120: In correlation calculation, 70/88 lines invalid

or

/group_workspaces/cems2/fiduceo/Data/FCDR/HIRS/v0.8pre/easy/noaa18/2009/05/08/FIDUCEO_FCDR_L1C_HIRS4_noaa18_20090508095747_20090508113946_easy_v0.8pre_fv0.7.nc

WARNING  FCDR_HIRS.metrology 2018-11-06 22:29:17,519 metrology.calc_corr_scale_channel:1120: In correlation calculation, 11/12 lines invalid

or

/group_workspaces/cems2/fiduceo/Data/FCDR/HIRS/v0.8pre/easy/noaa10/1987/09/22/FIDUCEO_FCDR_L1C_HIRS2_noaa10_19870922130925_19870922145033_easy_v0.8pre_fv0.7.nc

WARNING  FCDR_HIRS.metrology 2018-11-06 22:30:41,813 metrology.calc_corr_scale_channel:1120: In correlation calculation, 10/11 lines invalid

/group_workspaces/cems2/fiduceo/Data/FCDR/HIRS/v0.8pre/easy/noaa14/1996/02/21/FIDUCEO_FCDR_L1C_HIRS2_noaa14_19960221075418_19960221093616_easy_v0.8pre_fv0.7.nc

INFO     FCDR_HIRS.metrology 2018-11-06 22:30:49,654 metrology.calc_corr_scale_channel:1120: In correlation calculation, 61/87 lines invalid

gerritholl commented 5 years ago

This happens due to the rather conservative selection in https://github.com/FIDUCEO/FCDR_HIRS/blob/master/FCDR_HIRS/metrology.py#L1073:

brokenline = bad.sel(n_c=~brokenchan).any("n_e").any("n_c")

i.e., I'm rejecting a line if any pixel on any channel is bad, which may be OUTLIER_NOS. That means a single channel can spoil it for an entire orbit. But if I let them through, then outliers will mess up the cross-channel (or other) covariance matrices calculated using the same. So what is needed, is a robust version of CURUC. That is difficult and I don't have time to implement it.

gerritholl commented 5 years ago

This is also what is causing some FCDR processing to fail with

RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 400.

gerritholl commented 5 years ago

One way to handle this, instead of throwing out lines with at least some outliers, fill those outliers by interpolating or (even simpler) median values, only for the purposes of CURUC calculations.

gerritholl commented 5 years ago

Moreover, when I process a small segment only, more lines get rejected than otherwise due to the way the outlier rejection algorithm works. This happens due to the MEDMAD filter in typhon when some of the inputs are nearly constant, such that a small genuine variation is rejected as outliers.

gerritholl commented 5 years ago

Closed by https://github.com/atmtools/typhon/pull/254

FIDUCEO / FCDR_HIRS

Too many invalid lines for curuc screening #322