JuliaEarth / GeoStats.jl

An extensible framework for geospatial data science and geostatistical modeling fully written in Julia
https://juliaearth.github.io/GeoStatsDocs/stable/
MIT License
506 stars 60 forks source link

An improved implementation of Nested Variograms #127

Closed exepulveda closed 3 years ago

exepulveda commented 3 years ago

I'm raising this issue for discussion on Nested Variogram.

The motivation is for having a better design of variograms that helps us to incorporate turning bands (co-) simulations and to implement a linear model of coregionalization (LMC)

Currently we have unary structures that have a property nugget, although nugget effect is a valid structure. We already have a PR for [Nugget effect] (https://github.com/JuliaEarth/Variography.jl/pull/31) which is the first step. Also there are two mechanisms to make nested structures:

g = SphericalVariogram() + ExponentialVariogram() which is a SumVariogram. This is OK and it can be composed as follow to include more than two structures: g = NuggetEffect() + SphericalVariogram() + ExponentialVariogram() which is a SumVariogram that contains NuggetEffect and a nested SumVariogram.

My proposal is to have something like: g = 0.2*NuggetEffect() + 0.8*SphericalVariogram(range=10.0) + 0.5*ExponentialVariogram(range=100.0)

The scale factors could be stored separately to facilitate internal calculations and be ready for a LMC, for example for two variables: lcm = [0.2 0.1; 0.1 0.2] *NuggetEffect() + [0.8 0.4; 0.4 0.8]*SphericalVariogram(range=10.0) + [0.5 0.3; 0.3 05]*ExponentialVariogram(range=100.0)

NestedVariogram would be just a Array of SimpleStructures.

For a smooth transition we can introduce SphericalStructure, ExponentialStructure, etc. and redefine SphericalVariogram (as an example) as follows: SphericalVariogram(nugget=n, sill=s, range=r) = NestedVariogram(scales=[n, (s-n)], structures=[NuggetEffect(), SphericalStructure(range=r)])

Internally NestedVariogram would look like:

struct NestedVariogram{T,D} <: Variogram{T,D}
  scales :: T[]
  structures :: VariogramStructures[]
end

I cannot wait for your comments :-)

juliohm commented 3 years ago

Thanks for raising this issue. I like the idea of providing better support for LMC. Currently, we can combine the different variograms with scalars and sums, but perhaps we need to introduce an algebraic structure with array coefficients, similar to what you suggested.

My proposal is to have something like: g = 0.2*NuggetEffect() + 0.8*SphericalVariogram(range=10.0) + 0.5*ExponentialVariogram(range=100.0)

We can currently write this code just fine, but I think you meant to say that it is hard to extract the coefficients with an API? Notice that 0.5*ExponentialVariogram() will return a ScaledVariogram which stores the coefficient c=0.5 as a scalar. Currently, we have no API to query this scalar, or more generally to decompose a compositive additive model into its constituent parts.

Perhaps we can address this issue in two steps. First, we need to figure out how we can generalize the scalar variogram models to LMC models. We need to make sure that we have a good design here to avoid over-complicating user code downstream. Second, we need to have methods that can extract the structures from an additive model with the corresponding coefficients. This should be easy after we have a good design for the first part.

struct NestedVariogram{T,D} <: Variogram{T,D} scales :: T[] structures :: VariogramStrucutres[] end

This seems like a good alternative approach to our current combination of SumVariogram and ScaledVariogram. Instead of modeling the specific sums and scalings, this type you wrote above would contain a list of the coefficients and a list of simple variogram models. I will digest the idea with more time. I cannot find a use case right now where it would be important to know that we have a SumVariogram or ScaledVariogram specifically. It seems that a simple NestedVariogram as you suggested would be fine. @rmcaixeta what do you think about this?

Excited to discuss this further :+1:

exepulveda commented 3 years ago

My point is that g = C_0*Nugget() + C_1*Structure_1(range=r_1) + ... + C_n*Structure_n(range=r_n) is very clear from user and conceptual perspectives. C_i can be a scalar for the univariate case, or a matrix for multivariate case. The internal implementation should take care of a good design and backward compatibility.

Indeed it is straightforward to translate from a combined SumVariogram or/and ScaledVariogram into a NestedVariogram.

The problem with what we have is that we can have this unusual variogram: 0.5*SphericalVariogram(nugget=0.2, sill=0.4, range=10.0) + 0.6*ExponentialVariogram(nugget=0.25, sill=0.35, range=100.0)

I think our API should prevent users to come with something like that :-)

The above example is equivalent to a NestedVariogram with:

scales = [ 0.5*0.2 + 0.6*0.25, 0.5*(0.4-0.2), 0.6*(0.35-0.25)]
structures = [Nugget(), SphericalStructure(range=10.0), ExponentialStructure(range=100.0)]
juliohm commented 3 years ago

I like the design pretty much. Let's just think about it more carefully before implementing the changes. Regarding the possibility of specifying the sill inside each structure, I think we should keep it and assume that users will know what they are doing. Our code can also do the right thing always and internally convert a 0.5*ExponentialVariogram(sill=2) to a 1.0*ExponentialVariogram(). We could call this a canonical form, and have a function that does that for us internally:

canonicalform(::NestedVariogram) = # do the rescaling
rmcaixeta commented 3 years ago
struct NestedVariogram{T,D} <: Variogram{T,D}
 scales :: T[]
 structures :: VariogramStructures[]
end

I really like that structure. I'm drafting an extra solver that changes variograms locally and with that struct would be much more easy and intuitive to iterate over the structures

canonicalform(::NestedVariogram) = # do the rescaling

Nice. Considering all the kwargs, this would be a good solution to organize the final variogram. Do you think the nugget could be adjusted too? 0.8*SphericalVariogram(nugget=0.5,range=10.0) = 0.4*NuggetEffect() + 0.4*SphericalVariogram(range=10.0)

g = 0.2*NuggetEffect() + 0.8*SphericalVariogram(range=10.0) + 0.5*ExponentialVariogram(range=100.0)

Thinking as an user (and I might be somewhat biased with mining softwares), it'd be more clear to me if the kwargs are only range and distance. But as they are optional, should be ok to keep them all and deal with them internally. The good thing of this new structure is that we could avoid eventual confusions (I had in the beggining) of using nugget as keyword. Informing it as a separate structure would not raise questions if it is part or not of the structure sill

juliohm commented 3 years ago

I also thought about the sill and nugget options some time ago, and realized that most users (which do not have a geostatistics background) will pick a single structure and adjust the options with a fitting method. So it is convenient to just write GaussianVariogram(sill=2.5, range=1.6, nugget=0.1) as opposed to NuggetEffect(0.1) + 2.5*GaussianVariogram(range=1.6). Perhaps we change minds in the future, but given that we want to strike a user-friendly API and at the same time cover advanced use cases, I think keeping the two approaches is reasonable.

juliohm commented 3 years ago

Regarding the canonical form transformation, it should take the nugget into account as well:

Variogram(range=r, sill=s, nugget=n) ===> NuggetEffect(n) + s * Variogram(range=r)

We need to wait for this issue to be fixed in order to implement it with the Parameters.jl approach.

exepulveda commented 3 years ago

I will create a branch for this and we can sketch possible solutions. The idea is to define this new NestedVariogram and implement the transformation from the structs we have with the canonical form. Users are free to use both approaches. When we are happy with the design we can think in merging. What do you think?

juliohm commented 3 years ago

Thanks @exepulveda for taking the lead on this. Please consider drafting the changes in steps. I suggest to first prepare a PR for the NestedVariogram that replaces both SumVariogram and ScaledVariogram. This PR would also refactor the + and * to return the NestedVariogram instead of the former types.

It would be a good idea to use tuples for the fields of the NestedVariogram:

struct NestedVariogram{N,C,V} <: Variogram
  cs::NTuple{N,C}
  γs::NTuple{N,V}
end

but if the code becomes too ugly with tuples, we can use normal vectors. With tuples we would know at compile time how many structures are there in the nested model, and that may be useful for matrix optimizations.

The issue with the canonical form, we can postpone to a second PR. :+1:

exepulveda commented 3 years ago

I think it is a good strategy. Let me draft something. This time it could be good if @juliohm and @rmcaixeta can be involved in this PR and not just reviewers.

exepulveda commented 3 years ago

I just pushed to the branch "canonical_variograms". Please, have a look and any comment, improvement is welcome. Note this is a draft only :-)

juliohm commented 3 years ago

Thank you. Please create a PR so that we can review the code as we go.

On Tue, Oct 20, 2020, 21:12 exepulveda notifications@github.com wrote:

I just pushed to the branch "canonical_variograms". Please, have a look and any comment, improvement is welcome. Note this is a draft only :-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaEarth/GeoStats.jl/issues/127#issuecomment-713191484, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3K6LL3EIKOGRDC3FATSLYKM5ANCNFSM4SRH3UNA .

exepulveda commented 3 years ago

My original idea is to first check the draft to see if we agree. Also, anyone can contribute to the changes. Let's do the PR when the code is ready.

juliohm commented 3 years ago

When you create the PR nothing changes about the branch. That is the idea. You can keep working on the branch and the PR will highlight the differences and will allow us to comment on specific parts. Makes sense?

On Tue, Oct 20, 2020, 21:52 exepulveda notifications@github.com wrote:

My original idea is to first check the draft to see if we agree. Also, anyone can contribute to the changes. Let's do the PR when the code is ready.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaEarth/GeoStats.jl/issues/127#issuecomment-713203753, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3LX6E4D6A3QWFAHPGTSLYPCVANCNFSM4SRH3UNA .

exepulveda commented 3 years ago

Let's do it.