ORAC-CC / orac

Optimal Retrieval of Aerosol and Cloud
GNU General Public License v3.0
28 stars 19 forks source link

Use absolute values, rather than log, when setting X in the driver file #83

Closed adamcpovey closed 3 months ago

adamcpovey commented 1 year ago

In the text LUTs, the optical depth axis is logarithmically spaced and the values written in the tables are the log of the actual value (e.g. the first COT value is -20). As such, all references to optical depth in the state vector must also be in log space (e.g. the prior for aerosol is -1.5). The effective radius is logarithmically spaced for aerosol tables but linearly spaced for cloud. This is confusing.

The NCDF LUTs specify if axes are logarithmically or linearly spaced as an attribute. This commit adds a flag to the LUT Grid structure indicating if an axis is logarithmic (which is detected by negative values in the text tables). The set values are then log-ed as required.

I have a few questions for our active developers:

andyprata commented 1 year ago
  1. On the general point about using linear values (that are log10'd under the hood) - I think this is a nice intuitive change. In my code, I have comments reminding me that if I want to set apriori tau to 60, I need to use 1.77815 (10^1.77815 = 59.9998). This change will make reading and interpreting input values more intuitive. Obviously this could turn into a bit of a 'gotcha' if people miss this change and don't modify their input files appropriately.

  2. Seems OK to me to be consistent and make the same change for Sx. Side note: For cloud, we have Ctrl%Sx(IRe)=1e8 and Ctrl%Sx(ITau)=1e8 as the default setting in read_driver. Does that mean we are saying apriori uncertainty in COT is 10^1e8? Should these very large numbers be reviewed? Will they cause issues with matrix inversions?

  3. Yes I think it makes sense to put limit checking (and the log10 conversions) in set_limits.

garethethomas commented 1 year ago

Certainly for a new user, using the new LUTs, it makes sense to have all inputs in linear space. Applying the change to Sx is also sensible for consistency, but not for clarity due to the non-linearity. Not only will changing the a priori also change the actual value of Sx, but it is not obvious what a "linear value of Sx" when the retrieval is in log space actually means. For example, if we say we have an apriori of x=12.0 with a linear uncertainty (sigma) of 0.25, we might naively say: IDL> x = 12.0 IDL> lx = alog10(x) IDL> print, x, lx 12.0000 1.07918 IDL> sx = 0.25^2 & lsx = alog10(sx) IDL> print, sx, lsx 0.0625000 -1.20412 This clearly doesn't make any sense at all (the actual uncertainty ranges are completely different). I'd be interested in seeing how you'd suggest defining Sx in linear space before giving a thumbs up. Having the prior values specified in linear space, while the Sx values are specified in log10-space is even more confusing than having everything specified in log10-space, so it's a bit of a problem. As for the location of the changes, I think get_spixel would be fine for the conversion calculations, with set_limits just doing what its name suggests, especially if putting the conversion in set_limits would require a more major restructuring of the code.

adamcpovey commented 1 year ago

@garethethomas The question about how to pass the uncertainty is why I didn't commit anything. The maths of OE mean they errors have to be Gaussian. Hence, the options are:

  1. Use whatever value is in Ctrl%Sx regardless of underlying context. We all appear to agree that this is misleading.
  2. Use Ctrl%Sx for linear uncertainties and add Ctrl%Sx_log for logarithmic uncertainties. This makes the code easier to read but still requires the user to know the structure of the LUT.
  3. For log axes, put the value in Ctrl%Sx through $\sigma_{\log} = \frac{\sigma_x}{x \ln 10}$ to calculate the equivalent log error.

3 isn't perfect, because linear error bars aren't even in log space but they're fairly close for the values we're talking about. For example,

import math
x, sx = 12.0, 0.25
y = math.log10(x)
sy = sx / x / math.log(10.)
print(y, sy)
#>>> 1.07918 0.00904780
print(10.**(y+sy) - x, x - 10.**(y-sy))
#>>> 0.252622 0.247413

@andyprata Sx = 1e8 is our way of saying "unconstrained", with an equal prior for any values between the limits permitted by the retrieval. This isn't rigorous OE (there are other methods for bounded but unconstrained minimisation) but it works. I don't believe it's a problem for matrix conditioning because the inversion of Sx is fairly simple.

both I'll move things into get_spixel and commit that later. It's not hard. Just makes a lot of noise.

garethethomas commented 1 year ago

Ok, I think method 3 will work fine for Ctrl%Sx. I guess we should also put some thought into other methods of defining the a priori (and/or first guess), even though they aren't generally used at the moment - I guess most options (like AUX or whatever) would have to expect log values directly. Actually, this raises a point that I've missed about the new LUTs up till now. With the old LUTs there was a ORAC_Class_config.sad file, which allowed you to set a per-phase (or -class) a priori values/uncertainties for effective radius and optical depth, using the SAD method for defining the X0, XB etc. This was the standard method for aerosol retrievals - how is this done with the new LUTs? (Of course, the scaling method 3 would also probably be fine for dealing with Sx from such data as well)

adamcpovey commented 1 year ago

I have never encountered a ORAC_Class_config.sad file. Setting a priori values for aerosol types is what all the crap at the bottom of https://github.com/ORAC-CC/orac/blob/master/tools/pyorac/definitions.py#L658-L669 is for.

adamcpovey commented 5 months ago

@garethethomas @andyprata How's that?

There is slightly more urgency to this PR as I just noticed that, since Don changed which axes are log/linear in the netCDF tables, we'll be outputting the wrong values for those tables.

adamcpovey commented 4 months ago

I'm now reasonable certain that this PR doesn't break the old tables. Next time I shall check that it fixes the new ones...

adamcpovey commented 4 months ago

Plots comparing cloud retrieval for (left) current repository code using a netCDF LUT, (centre) this PR using a netCDF LUT, (right) current repository code using a text LUT for (rows) eight regression test granules [four day, four night, two Terra, two Aqua, two Sentinel-3 A then B]. Uses ann_phase to manually post-process; white is clear sky or did-not-attempt. PR83_cth

PR83_cot

adamcpovey commented 4 months ago

PR83_cer

This PR doesn't change the output for netCDF or text, but doesn't do much for the differences between them. Apologies for the worst colourbar - I swear everything else looked even more rubbish.