How to leverage other data while gridding? (Enhancement)

Leon6j commented 2 years ago

As another thread (#90) points out, one of the major weaknesses of the DIVA gridding is the poor handling of regions with no data. Bullseyes are often created.

I wonder if you could consider enhancing the DIVAnd.jl, so that it can automatically leverage values from another grid when there are no data and the associated gridding errors are too large?

Here is an example user case:

I have a 3-column observational data of Variable 1 (Longitude, Latitude, Variable 1). It does not have good coverage anywhere in the global ocean, and there will be some data sparse regions for sure. My goal is to grid the data onto a global grid.
I have a good satellite data based algorithm that allows me to calculate a Variable 1 value anywhere on the same global grid. It has larger uncertainty than a real measurement, but it is way better than DIVAnd.jl extrapolated values in data sparse regions.
Is it possible to allow DIVAnd.jl to grid Variable 1 based on the 3-column observational data, but for grid points where there are no real data to use, and as a result, the errors are known to be large, satellite algorithm derived values will be used to fill in?

Many thanks for your consideration!

jmbeckers commented 2 years ago

So I assume your in situ data are near surface values also.

Technically you can use satellite data as if they were in situ data by adding them into the observational array using (lon, lat, val) of each pixel.

This will however give too much importance to the satellite data and will make cross-validation approaches more difficult (error correlations within the satellite data). To alleviate this and also address the relative errors between the two data types you can a) subsample satellite data to have a coverage which is more comparable to the in situ coverage b) use a different epsilon2 for both data sets. To do this you need to create an array of epsilon2 in which each element refers to a specific value of epsilon2 for a data points.

Leon6j commented 2 years ago

Many thanks for the reply! Yes, my observational data is also surface data.

First of all, the satellite algorithm based data are already on the global grid, so there is no need to worry about gridding the satellite data.
Secondly, let's not worry about whether satellite is good enough for my research purpose. My satellite based values are actually really good. I just want to use observational data to enhance them where I can.
My question basically comes down to how to subsample satellite data and fill in those areas where DIVAnd does not have enough observational data and will create bulls eyes during the gridding?

ctroupin commented 2 years ago

I guess you can also use the satellite observations to create a background field: as you know the DIVA analysis are performed on anomalies with respect to a background or reference field, which is, in simple cases, a uniform field with a value equal to the average value of all the observations. Here you can do this:

Create a background field using the satellite data, with a long correlation long and a large noise-to-signal ratio
Extract the values of this background field at the locations of the in situ observations.
Perform the interpolation on the newly compute anomalies, with a smaller value of L and epsilon2.

Doing so, you ensure that the solution, in regions where no in situ obs. are available, take the value of the background field. And you don't need to sub-sample the satellite data.

Leon6j commented 2 years ago

Many thanks for chiming in!

Not sure I understood fully for #2 and #3. So you assume that when the anomalies are gridded, there won't be bulls eyes in regions where obs. data are not available? I doubt that is the case.

I'm thinking maybe I should do this instead: Is there a way I can find the index of the grid points where the gridding errors are too high due to a lack of observational data? May I do so reliably based on the CPME estimates? Once that index is figured out, I can use that info to enhance my gridded results, i.e., replacing the gridded values at those grid points with satellite based values.

ctroupin commented 2 years ago

Yes, I think that using a background field obtained by gridding the satellite observations, then performing the analysis of the in situ observations with that background field can help avoid the bulls eyes.

I'm now checking the doc and the examples to see if there is an example of how to do it.

Concerning the use of CPME, that can be a possibility, though I'm not sure what is the best way to extract the error fields at the locations of the observations.

jmbeckers commented 2 years ago

If your satellite data are already on the same grid as the analysis you want to do, just work with anomalies with respect to your satellite data. That way you automatically will have satellite data in regions where you do not have observations.

Technically, to easily calculate anomalies you can first do a dummy analysis with your in situ data, recover the structure s from it fi,s= DIVAndrun

and then use

newobs=DIVAnd_residualobs(s, fisat);

where fisat is the griddes satellite data (needs to be exactly on the same grid as your analysis)

Then you do an analysis with the anomalies of newobs and at the end sum up that analysis with your fisat.

Hope it makes sense

gher-uliege / DIVAnd.jl

How to leverage other data while gridding? (Enhancement) #96