Robinlovelace / simodels

https://robinlovelace.github.io/simodels
GNU Affero General Public License v3.0
15 stars 4 forks source link

Explore compatibility and integration with spflow #15

Open Robinlovelace opened 2 years ago

Robinlovelace commented 2 years ago

Building on #14 how does this package link with the spflow package?

Heads-up @LukeCe, thinking that using models from functions in your package could be an input into si_predict(). Sound reasonable? Any input welcome, input could go both ways so any code/ideas in here, e.g. use of the od package that does the OD data processing, that could help your work let me know.

LukeCe commented 2 years ago

Hi @Robinlovelace, I am very open to the idea of integratiing {si}/{od} and {spflow}.

The main goal of {spflow} is to implement efficient estimators for spatial econometric interaction models. These allow to account for spatial autocorrelation in gravity models and should be computationally feasible even for large sample applications. The geographic aspects are not in the scope of {spflow} and should be handled by other packages.

Since you raised the issue of modeling a situation where the origin and destination characteristics are distinct, I would like to point to an article I am working on with Christine Thomas https://www.tse-fr.eu/sites/default/files/TSE/documents/doc/wp/2022/wp_tse_1312.pdf. In it, we develop the matrix from estimation of a spatial econometric interaction model for the case where the set of origins can be distinct from the set of destinations and also for the case where the OD-matrix can be sparse. I have already implemented much of this work in the {spflow} package and plan to release an update in mid-May.

For in-sample predictions (fitted values), {spflow} already provides several methods, but for out-of-sample predictions, there are still some hurdles to overcome. In the out-of-sample case, we have to distinguish between "simple predictions" and extrapolations, i.e. predictions for flows that come from new origins or go to new destinations. For simple predictions, which are related to a change in the explanatory variables, the theory is clear and we should be able to implement them in the near future. Predictors that allow extrapolation to new sites are on our research agenda, but so far we do not have a clear methodology that could be implemented quickly.

An integration with si::si_predict() (I couldn't find od::od_predict()) might look like this

In order for {od}'s data structures to be directly usable by {spflow}, they would have to provide the following information:

I don't know if this is something you are considering.

Robinlovelace commented 2 years ago

Hi Lukas, quickfire follow-up: many thanks for your detailed and positive response. It sounds like {si} and {spflow} could work well together and I look forward to trying to use models generated by your package as an argument in si_predict() or some variant of it. I think these packages could be mutually supportive, with {spflow} outstanding on modelling and {si} having the potential to support with geographic data processing. On that note I'm planning to show how the representation of OD datasets as geographic desire lines can support disaggregation and diversification of start and end locations using the 'jittering' approach outlined in this recently published paper and implemented in the Rust crate odjitter by Dustin Carlino that has simple R bindings. I mention these additional links because you clearly have plenty of experience modelling OD data and interested in your thoughts on disaggregation and other things building on these (hopefully eventually sturdy) foundations.

LukeCe commented 2 years ago

Hi Robin, I also think {si} and {spflow} have great potential to complement each other.

The disaggregation + jittering approach presented in your article is a great solution to the problem of representing OD flows in road networks. Whether such disaggregation can increase the statistical accuracy of interaction models is a question that deserves further consideration. Since the tools we provide in {spflow} clearly aim at high efficiency in the view of large samples, they might help to find an answer.

If you want to test the package you should know that the current version of {spflow} only allows modeling of "textbook data", where origins are equal to destinations and all potential flows are observed. I am working on an update that will remove these limitations and plan to make it available in May.