Robinlovelace / simodels

https://robinlovelace.github.io/simodels
GNU Affero General Public License v3.0
15 stars 4 forks source link

Tidyeval #10

Closed Robinlovelace closed 2 years ago

Nowosad commented 2 years ago

@Robinlovelace:

  1. Why the first plot in README is different in this version? Should not the result stay the same?
  2. I am not very familiar with dplyr-NSE, but your solution looks good (except the point above...)
Robinlovelace commented 2 years ago

@Robinlovelace:

1. Why the first plot in README is different in this version? Should not the result stay the same?

2. I am not very familiar with dplyr-NSE, but your solution looks good (except the point above...)
  1. It's fine because the plot is new, it does not replace the previous one, there are now 2 plots (maybe that's an issue).

We still have this plot:

plot(od_res["interaction"], logz = TRUE)

There is no logz in the new plot, maybe I should add it but key thing: results are the same (pretty sure).

Nowosad commented 2 years ago

@Robinlovelace I am taking about this difference:

image

Robinlovelace commented 2 years ago

@Robinlovelace I am taking about this difference:

You're right Jakub, just double checked. Looking into it...

image

Robinlovelace commented 2 years ago

Result on new branch:

# Aim: test difference between SE and NSE results in si
remotes::install_github("robinlovelace/si", ref = "tidyeval")
#> Skipping install of 'si' from a github remote, the SHA1 (6a9eac73) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

gravity_model = function(beta, d, m, n) {
  m * n * exp(-beta * d / 1000)
} 
od_res = si::si_to_od(si::si_centroids, destinations = si::si_centroids) %>% 
  si::si_calculate(fun = gravity_model, 
               m = origin_all,
               n = destination_all,
               d = distance_euclidean,
               beta = 0.3)

summary(od_res$interaction)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       2     491    1534    5340    4447 3207681

Created on 2022-04-22 by the reprex package (v2.0.1)

Result on old branch...

Robinlovelace commented 2 years ago

Which is very different to the NSE result:

# Aim: test difference between SE and NSE results in si
remotes::install_github("robinlovelace/si")
#> Skipping install of 'si' from a github remote, the SHA1 (d9ae80e6) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

gravity_model = function(od, beta) {
  od[["origin_all"]] * od[["destination_all"]] *
    exp(-beta * od[["distance_euclidean"]] / 1000)
}
od = si::si_to_od(si::si_centroids, destinations = si::si_centroids) 
od_res = si::si_calculate(od, fun = gravity_model, beta = 0.3, constraint_p = origin_all)
nrow(od_res)
#> [1] 11449
summary(od_res$interaction)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#>   0.0009   0.1928   0.5465   1.7676   1.4767 641.5744

Created on 2022-04-22 by the reprex package (v2.0.1)

Robinlovelace commented 2 years ago

Mystery solved: the original sensibly was production constrained. Removing that leads to the same result:

# Aim: test difference between SE and NSE results in si
remotes::install_github("robinlovelace/si")
#> Skipping install of 'si' from a github remote, the SHA1 (d9ae80e6) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

gravity_model = function(od, beta) {
  od[["origin_all"]] * od[["destination_all"]] *
    exp(-beta * od[["distance_euclidean"]] / 1000)
}
od = si::si_to_od(si::si_centroids, destinations = si::si_centroids) 
od_res = si::si_calculate(od, fun = gravity_model, beta = 0.3)
nrow(od_res)
#> [1] 11449
summary(od_res$interaction)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       2     491    1534    5340    4447 3207681

Created on 2022-04-22 by the reprex package (v2.0.1)

Robinlovelace commented 2 years ago

Merging now, thanks for the review @Nowosad and the general thumbs up on Twitter @JanCaha. Basically nobody has said it's a silly approach and the fact it's now a 'tidy' implementation of SIMs, enabling pipelines and autocompletion of variable names is enough of a review for now. Progress!

Robinlovelace commented 2 years ago

Many thanks @paleolimbot and @gadenbuie it seems my first attempt at NSE in a package function was not crazy. That means a lot, and it's great being about to autocomplete variable names when creating SIMs, something many {tidyverse} detractors don't appreciate. Any other comments welcome!