SugiharaLab / rEDM

Applications of Empirical Dynamic Modeling from Time Series
Other
117 stars 43 forks source link

Inverted name for the output generated by block_lnlp function #60

Closed rafalopespx closed 2 years ago

rafalopespx commented 2 years ago

Hi there,

So I am using block_lnlp function to make some estimation on interaction strength coefficients, the thing is the output by the function gives the coefficients with inverted nomenclature for the interaction strengths names, I think. Like, if I put as the target column X and Y to predict X, the interaction strengths should be the Jacobian of \frac{\partialX}{\partialY}, but the output name the interaction strength as \frac{\partialY}{\partialX}, this is a problem only on the names not on the estimation, right?

Can anyone confirm this to me? If not I have to inverted the coefficient?

Thanks!

SoftwareLiteracy commented 2 years ago

The current code uses the SMap API for S-map. Which is what you are referring to?

Example:

> library(rEDM)
> df = circle # x = sin(t); y = cos(t)
> head(df)
  Time      x      y
1    1 0.0000 1.0000
2    2 0.0631 0.9980
3    3 0.1260 0.9920

> sm = SMap( dataFrame = df, lib = "1 100", pred = "101 190", columns = "x y", target = "x", embedded = TRUE, theta = 3 )

> apply( sm $ coefficients[,c("C0", "∂x/∂x", "∂y/∂x")], 2, mean, na.rm = TRUE )
        C0      ∂x/∂x      ∂y/∂x 
-9.110e-07  9.980e-01  6.311e-02 

Are not these results analytically correct? (Presuming the circle data have a 200 points (199 intervals) from 0 to 4π)

rafalopespx commented 2 years ago

Good to remember the circle as example, so here the target column is 'x', so we are trying to predict 'x' through a model that is a linear combinations of Jacobian elements determined at each point of time by the time series of 'x' and 'y', right?

Our model it is something like: x(t) = c0 + partial'x'/partial'x' + partial'x'/partial'y', because the Jacobian element it is the derivative of the predictee time series, 'x', with respect to the predicting time series, 'x' & 'y'. The thing is the the names (I hope) of the outputed coefficients they are inverted, the derivative should always be with respect to the predictee time series or the target time series, here 'x'.

I have the following paper to make the case, https://onlinelibrary.wiley.com/doi/full/10.1111/ele.13652, Nova et al make use of SMap function and from it they derive the coefficients from the SMap and at figure 5 they write the following, when applying SMap to predict cases:

image

https://onlinelibrary.wiley.com/cms/asset/1b52ad29-0211-4e69-b266-5e1effba89f6/ele13652-fig-0005-m.jpg

SoftwareLiteracy commented 2 years ago

Thanks for the example and feedback. Please note I am not being critical, but trying to understand the problem and solution ; )

Since the article is not open access, I cannot see the context of the figure. My first impression is that these discrete ratios all are represented by thick lines (red & blue) that amount to zero. I wonder if this is a clear example,

You mention: "the derivative should always be with respect to the predictee time series or the target time series" -- I believe this is what is demonstrated by the circle example.

Another perspective is that SMap coefficients are Jacobians, but not as I understand, local gradients with respect to a global independent basis. Rather with respect to a local projection onto the linearized trajectory of the two variables. A directional derivative. At least this is what I understand from the circle case as explored here: SmapCoeff-Short-2019-11-25.pdf

My interpretation of the analytical values of 1 and 0 is they represent constant curvature (∇uF = 1 ~ dx/dx) and orthogonality (∇uF = 0 ~ dy/dx) between x and y.

duriah commented 2 years ago

I have noticed this as well. As far as I can tell (and I'm fairly certain) the name of coefficients from the output of SMap are inverted but their values are correct.

This is why I am relatively sure of this: the interpretation of the SMap coefficients as interaction time series was introduced with this publication: https://royalsocietypublishing.org/doi/full/10.1098/rspb.2015.2258 ("Tracking and forecasting ecosystem interactions in real time" Deyle et al.) and in there they have the coefficients named correctly, i.e. e.g. dx/dx, dx/dy, dx/dz, etc. Thus, I used their code (provided in their supplementary info) to estimate the interactions of some data and compared it to the output of SMap() and indeed the output (i.e. the coefficients) were identical but SMap named them dx/dx, dy/dx, dz/dx, etc. while the code from Deyle et al. named them dx/dx, dx/dy, dx/dz.

Further confirmation came to me when I saw that Cenci et al. ("Regularized S-map for inference and forecasting with noisy ecological time series"), which extended this method, used the same nomenclature as Deyle et al. did.

Summarizing, I agree that the names in the output from Smap are inverted, but the coefficient are correct.

rafalopespx commented 2 years ago

Thanks for this confirmation @duriah, I was asking myself for a long time if the coefficients itself were inverted. As far as I tested, and this is why I came up with this issue, it is only a matter of wrong nomenclature output. This good!

I was reading by this days too the Deyle et al. paper and I had the same conclusion of you, basically SMap and block_lnlp functions invert the nomenclature for the derivative fraction, only. At the present version of the rEDM package, the matter it is only to correct the SMap function, due to block_lnlp function calls SMap internally.

@SoftwareLiteracy more on the issue, the time series generated by the SMap coefficients, or the whole series of locally weighted Jacobian elements, can be interpreted as a global estimation of the Jacobian coefficients all over the SSR.

Thanks for the reply again!

SoftwareLiteracy commented 2 years ago

Thanks for the feedback. Apologies for the delay...

It certainly is advisable to cite published results (+1). For a little more clarity, perhaps we can agree on a purely logical/analytical perspective.

To that end: Consider F(t,x,y) to predict target x, and target y using SMap on the 2-D dynamic state-space defined by F(t,x,y).

I think the nomenclature of the current code is:

x = C0 + ∂x/∂x x + ∂y/∂x y y = C0 + ∂x/∂y x + ∂y/∂y y

and what is suggested is it should be:

x = C0 + ∂x/∂x x + ∂x/∂y y y = C0 + ∂y/∂x x + ∂y/∂y y

Is that right?

Either way, the nomenclature is consfusing. ∂x/∂x makes little sense to me.

Perhaps:

x = C0 + ∂/∂x x + ∂/∂y y y = C0 + ∂/∂x x + ∂/∂y y

which seem degenerate, but, one can only predict one target (x or y) at a time so the coefficients are numerically distinct from different SMap invocations for x and y.

rafalopespx commented 2 years ago

Thanks for the prompt response and attention, is good to know that are people working hard to better the rEDM package.

I think I got something now, the nomenclature were in that way to emphasize that the right-hand side of the SMap equation has only variables that are being used to predict the left-hand side, okay on light of this is understandable the actual nomenclature.

But the problem seems that SMap it is a SSR that uses the target series on it, although the coefficients aren't derived from the target time series itself, the procedure is used to derive them and they are always interpreted as like, i.e. as the partial derivative of the target series with respect to the series which wants to estimate the interaction force on the target time series.

@SoftwareLiteracy I think this maybe a solution, although to me seems more naturally to write the coefficients as point Jacobian estimates for the interaction between the target time with respect to each component time series of the SSR inputed to the SMap

SoftwareLiteracy commented 2 years ago

Hmm... this is not entirely clear to me.

The "target" is used as the RHS of the linear system. The overdetermined columns of the linear system design matrix correspond to "columns" (either embedded to E if one column, or explicitly E columns if multiple columns and embedded = TRUE).

I interpreted this to mean the solution (the resolved coefficients) are fitting "columns" to "target", thus the target is the independent, reference (with respect to) variable. Maybe this is where the confusion arises?

The code that creates coefficient labels is:

            for ( auto colName : embedding.ColumnNames() ) {
                std::stringstream coefName;
                coefName << "∂" << colName << "/∂" << parameters.targetName;
                coefNames.push_back( coefName.str() );
            }

It seems useful (necessary) to have the column and target both represented in the coefficient label. But, as noted, ∂x/∂x seems notationally misleading, and, it seems perhaps the use of target is wrong?

Thanks for help guiding this to a code improvement!

duriah commented 2 years ago

∂x/∂x does make sense, but it is understandable that the notation does not make it easy because it is a simplified version. In fact, ∂x/∂x is actually ∂x(t + tau)/∂x(t), where t is the time and tau is the smallest time step available (i.e. depending on the time series data).

The forecasting is then:

image

This equation is taken from Cenci et al (eq. 11) but also appears with a different notation in Deyle et al (section 3) (see above). J is the interaction Jacobian containing the coffecients estimated with S-map. Also note that in the equation they put tau=1.

Note also that it has to be x = C0 + ∂x/∂x x + ∂x/∂y y (like in the equation) because otherwise the units do not match up.

I hope this makes it clear.

SoftwareLiteracy commented 2 years ago

Thanks. This is explicit and sensible.

One might quibble whether the estimate of ∂x/∂x is ∂x(t + Tp)/∂x(Δt), where Tp is the "time to prediction" interval, but the example is clear and the dimensional argument stellar.

The agreement is then: x = C0 + ∂x/∂x x + ∂x/∂y y; right?

In terms of the SMap columns and target parameters: ∂ target / ∂ column

Regarding ∂x/∂x, absent hand-waving and context it entails some head scratching at first sight. At least to my limited & pedantic mind. Suggestion for succinctly making it explicit? Perhaps x = C0 + ∂x(t)/∂x x + ∂x/∂y y; adding (t) in cases where column = target? Seems klunky and a source of confusion if not also done fo all terms, which then seems overly explicit.

Or leave it, adding a note in the docs?

duriah commented 2 years ago

Yes, correct.

I'd suggest to either adding a note in the docs or adding the time everywhere

rafalopespx commented 2 years ago

I go with @duriah.

An intemediary solution could be just to put the fraction at correct form, like a minor correction, this meets the example and the dimensional argument

Thanks again for the attention and patience!

SoftwareLiteracy commented 2 years ago

Thank you for the guidance and correction. The next release (1.12.2) will have the form: x = C0 + ∂x/∂x x + ∂x/∂y y