Robinlovelace / simodels

https://robinlovelace.github.io/simodels
GNU Affero General Public License v3.0
15 stars 4 forks source link

tidy framework and data.table operations #13

Closed rafapereirabr closed 2 years ago

rafapereirabr commented 2 years ago

Hi Robin. The package is shaping really nicely and it's aligned with an idea I've been thinking about related to spatial interaction models and accessibility (perhaps a topic for a chat in the future). I just wanted to ask whether package will be exclusively built on top of a tidy framework, or whether it could also use some data.table operations under the hood to be more efficient.

Robinlovelace commented 2 years ago

Good question. In my experience OD data processing and modelling, including SIMs, are seldom computationally intensive relative to other parts of transport modelling and analysis workflows including routing and visualisation, so no plans at present. The priority currently is to get it somewhere close to a minimum viable product for a user friendly SIM package, as per #5 but happy to revisit this at some point later down the line. Also, because data.table objects extend the data frame, the approach should be compatible. Just tested in the example below:

remotes::install_github("robinlovelace/si")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'si' from a github remote, the SHA1 (b1e2c250) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(si)
# ?si_predict
od = si_to_od(si_zones, si_zones, max_dist = 4000)
#> 1695 OD pairs remaining after removing those with a distance greater than 4000 meters:
#> 15% of all possible OD pairs
m = lm(od$origin_all ~ od$origin_bicycle)
od_updated = si_predict(od, m)
class(od_updated)
#> [1] "sf"         "data.frame"
od_dt = data.table::data.table(od)
class(od_dt)
#> [1] "data.table" "data.frame"
od_updated_dt = si_predict(od_dt, m)
class(od_updated_dt)
#> [1] "data.table" "data.frame"
identical(od_updated$interaction, od_updated_dt$interaction)
#> [1] TRUE

Created on 2022-04-25 by the reprex package (v2.0.1)

Do you have a use case, or ideally a reproducible example, where better support for data.table would be beneficial? PRs and other ideas welcome, especially at this nascent stage.

rafapereirabr commented 2 years ago

Thanks for the clarification, Robin. I believe data.tablewould not make much difference in si_predict() but it could possibly improve the efficiency of other functions such as si_to_od(), specially when dealing with large data sets. I totally understand the priority of getting to a minimum viable product first. Please let me know if you need a hand with this issue in the future and I'd be glad to help.

Robinlovelace commented 2 years ago

Agreed, low priority area. If it does come in useful would be more likely in the od package which is the driver of the data frame to sf conversion, which in turn uses sfheaders. Overall it's small beer in the grand scheme of things, good to think about and will happily revisit this if there are concrete use cases where it could be useful. Closing for now.