gher-uliege / DIVAnd.jl

DIVAnd performs an n-dimensional variational analysis of arbitrarily located observations
GNU General Public License v2.0
70 stars 11 forks source link

"gradient" effect on DIVAnd maps[🐞] #129

Open qcazenave opened 1 year ago

qcazenave commented 1 year ago

Describe the bug

image image image image

diva3d call `@time info = diva3d( (lonr,latr,depthr,TS), (obslon,obslat,obsdepth,obstime), obsval, len, epsilon2, # epsilon2 = 1 (profile) or 10 (timeseries) outputfile[s], varname, bathname = bathfile, mask = mask, background = bkgd, #seasonal profile

plotres = plotres,

    transform = Anam.loglin(maxval),
    ncvarattrib = ncvarattrib,
    ncglobalattrib = metadata[s],
    surfextend = true,
    coeff_derivative2 = param_d2,                #param_2d = [0,0,0]
    memtofit = 50

);`

jmbeckers commented 1 year ago

I do not think it is a bug. Both situations arise in regions where you basically extrapolate and where a gradient in the nearest data can lead the extrapolation into unrealistic values.

In theory the error field should allow to identify such regions, in particular the "exact" error field generally is higher in such regions than the approximations used in practice (but is expensive).

Did you try the logit transformation instead of the loglin ? The latter might be responsible to allowing larger values than expected.

Just a stupid question: the values you use in DIVAnd for 2 µmole is 2 or 2E-6 ? (that could play a role when data are transformed)

Related note or question to developers: in loglin is it implicitly assumed that the data have values around 1 so that we have anomalies around zero? Maybe we need to add a scaling parameter too ?

qcazenave commented 1 year ago

Hi @jmbeckers , I agree, it is not really a bug but I was not sure about the label to use... So, I haven't tried the logit transformation and the value I use for 2 umole is 2.

qcazenave commented 1 year ago

Update : logit instead of loglin = no change at all. Use of logit : transform = Anam.logit(min=minval,max=maxval)

jmbeckers commented 1 year ago

that is strange. What are the values of minval and maxval ?

qcazenave commented 1 year ago

minval = 0 maxval = 800 for DIN and 400 for silicate (in umole/l)

jmbeckers commented 1 year ago

Isn't that very large ? What are the obs values in the deep region ?

qcazenave commented 1 year ago

yes, it is very large, I believe such high values can only be found in coastal areas, near river mouths. In the deep region, obs values remain below 100 umol/l

qcazenave commented 1 year ago

Is it too large ? Should I keep the boundaries to the values found in the deep region ?

ctroupin commented 1 year ago

I would say that ideally one should not have to restrict the values of the input data, except if they are obviously wrong, which doesn't seem to be the case here.

What can be done is to use the residuals (diff. between observation and analysis at the location of the obs.) to discard some of the data points when the residual values is too large.

qcazenave commented 1 year ago

Yes, OK, but I don't understand how this can help with the effects presented above ?

ctroupin commented 1 year ago

Maybe by removing such data points you avoid the large gradients mentioned by @jmbeckers, so the extrapolation doesn't give so dramatic results.

jmbeckers commented 1 year ago

I would say that ideally one should not have to restrict the values of the input data, except if they are obviously wrong, which doesn't seem to be the case here.

What can be done is to use the residuals (diff. between observation and analysis at the location of the obs.) to discard some of the data points when the residual values is too large.

I did not mean to "cut" the input values. The logit transformation constrains the analysis to remain between two bounds. So if you know that there are never values larger than 100, logit with an upper value of 100 will make the analysis always fall below 100 (and also will probably reduce some gradients and hence avoid some larger extrapolations).

qcazenave commented 1 year ago

Thank you for your answer @jmbeckers. Indead, it was not clear to me but I understand now.

qcazenave commented 1 year ago

Hi, I would like to get back to the logit transformation constraint on the values with the "min" and "max" parameters : for the loglin transformation, there is also a "max" parameter. I assumed it referred to the maximum value that would be authorized in the retrived field but the result of the analysis does not confirm my assumption. So my question is : what does this "max" parameter in the loglin transformation refer to ?

jmbeckers commented 1 year ago

Loglin:

Provide the following transform log(x + epsilon) (for x < t) and its inverse. Beyond the threshold t (x ≥ t), the function is extended linearly in a continuous way.

So for loglin beyond the max value a linear transformation is used. (and not a cutting/clipping, for that Logit is needed).

qcazenave commented 1 year ago

OK, my mistake, I forgot to re-check that. Thank you for your answer.

qcazenave commented 1 year ago

With logit instead of loglin, I tried on chlorophyll-a data, using min=0mg/m2 and max=5mg/m3 and get the following error : ''' ERROR: LoadError: DomainError with -2.0224948157370686: log will only return a complex result if called with a complex argument. Try log(Complex(x)). ''' On the same data, with loglin (using 5mg/m3 as threshold for the linear extension), no error.

jmbeckers commented 1 year ago

It probably means you have data outside of that range (there is no sanity check in the function before doing the anamorphis). Can you check on the input data that they are in the range ?

qcazenave commented 1 year ago

Yes, I most likely have data outside of that range. I thought it would work nevertheless taking into account the data according to the range. I thought the logit transformation would only work on the analysis

jmbeckers commented 1 year ago

I think it is more sound you decide yourself which data you retain or what to do with the data out of your desired range (you can decide to discard the data or clip).

Logit works on the analysis, but to do so, all data are transformed into the new domain, so if data fall outside, the transformation is not valid.