gher-uliege / DIVAnd.jl

DIVAnd performs an n-dimensional variational analysis of arbitrarily located observations
GNU General Public License v2.0
70 stars 11 forks source link

How to leverage other data while gridding? (Enhancement) #96

Closed Leon6j closed 2 years ago

Leon6j commented 2 years ago

As another thread (#90) points out, one of the major weaknesses of the DIVA gridding is the poor handling of regions with no data. Bullseyes are often created.

I wonder if you could consider enhancing the DIVAnd.jl, so that it can automatically leverage values from another grid when there are no data and the associated gridding errors are too large?

Here is an example user case:

Many thanks for your consideration!

jmbeckers commented 2 years ago

So I assume your in situ data are near surface values also.

Technically you can use satellite data as if they were in situ data by adding them into the observational array using (lon, lat, val) of each pixel.

This will however give too much importance to the satellite data and will make cross-validation approaches more difficult (error correlations within the satellite data). To alleviate this and also address the relative errors between the two data types you can a) subsample satellite data to have a coverage which is more comparable to the in situ coverage b) use a different epsilon2 for both data sets. To do this you need to create an array of epsilon2 in which each element refers to a specific value of epsilon2 for a data points.

Leon6j commented 2 years ago

Many thanks for the reply! Yes, my observational data is also surface data.

ctroupin commented 2 years ago

I guess you can also use the satellite observations to create a background field: as you know the DIVA analysis are performed on anomalies with respect to a background or reference field, which is, in simple cases, a uniform field with a value equal to the average value of all the observations. Here you can do this:

  1. Create a background field using the satellite data, with a long correlation long and a large noise-to-signal ratio
  2. Extract the values of this background field at the locations of the in situ observations.
  3. Perform the interpolation on the newly compute anomalies, with a smaller value of L and epsilon2.

Doing so, you ensure that the solution, in regions where no in situ obs. are available, take the value of the background field. And you don't need to sub-sample the satellite data.

Leon6j commented 2 years ago

Many thanks for chiming in!

Not sure I understood fully for #2 and #3. So you assume that when the anomalies are gridded, there won't be bulls eyes in regions where obs. data are not available? I doubt that is the case.

I'm thinking maybe I should do this instead: Is there a way I can find the index of the grid points where the gridding errors are too high due to a lack of observational data? May I do so reliably based on the CPME estimates? Once that index is figured out, I can use that info to enhance my gridded results, i.e., replacing the gridded values at those grid points with satellite based values.

ctroupin commented 2 years ago

Yes, I think that using a background field obtained by gridding the satellite observations, then performing the analysis of the in situ observations with that background field can help avoid the bulls eyes.

I'm now checking the doc and the examples to see if there is an example of how to do it.

Concerning the use of CPME, that can be a possibility, though I'm not sure what is the best way to extract the error fields at the locations of the observations.

jmbeckers commented 2 years ago

If your satellite data are already on the same grid as the analysis you want to do, just work with anomalies with respect to your satellite data. That way you automatically will have satellite data in regions where you do not have observations.

Technically, to easily calculate anomalies you can first do a dummy analysis with your in situ data, recover the structure s from it fi,s= DIVAndrun

and then use

newobs=DIVAnd_residualobs(s, fisat);

where fisat is the griddes satellite data (needs to be exactly on the same grid as your analysis)

Then you do an analysis with the anomalies of newobs and at the end sum up that analysis with your fisat.

Hope it makes sense