Robinlovelace / simodels

https://robinlovelace.github.io/simodels
GNU Affero General Public License v3.0
15 stars 4 forks source link

Pefromance enhancements #33

Closed mem48 closed 2 months ago

mem48 commented 8 months ago

@Robinlovelace some work in progress

The goal is faster performance on large datasets.

Changes are:

1) New function points_to_od_maxdist that uses nngeo to get the nearest neighbours rather than creating the full matrix. Also adds support for projected coordinates and the ability to look for the nearest X regardless of distance. Could be useful when mixing rural and urban areas, where say 5000m is a long way to go for a shop in an urban area but a short distance in a rural area. Should be much faster for very large numbers of origins and destinations and less likely to run out of memory.

10,000 * 10,000 LSOAs with max_dist = 5000 took 29 seconds
35,672 * 35,672 LSOAs with max_dist = 5000 took 4.9 minutes

2) Tweaked si_calculate and si_predict that use data.table and avoid copying data when possible. Slight breaking change as constraint_production now needs to be a quoted character. I couldn't figure out the dplyr syntax, so welcome suggestions on a fix.

Example

nrow(od)
[1] 3627616
t1 = Sys.time()
od_res = si_calculate(
   od,
   fun = gravity_model,
   constraint_production = "origin_all",
   d = distance_euclidean,
   m = origin_all,
   n = destination_all,
   beta = 0.9
)
 t2 = Sys.time()
 difftime(t2, t1)
Time difference of 0.576057 secs
Robinlovelace commented 8 months ago

New function points_to_od_maxdist that uses nngeo to get the nearest neighbours

Big :+1: to use of nngeo.

Robinlovelace commented 8 months ago

Also adds support for projected coordinates and the ability to look for the nearest X regardless of distance.

Shouldn't that functionality be in the {od} package, easy to be upstreamed?

Robinlovelace commented 8 months ago
10,000 * 10,000 LSOAs with max_dist = 5000 took 29 seconds
35,672 * 35,672 LSOAs with max_dist = 5000 took 4.9 minutes

:rocket:

mem48 commented 8 months ago

Also adds support for projected coordinates and the ability to look for the nearest X regardless of distance.

Shouldn't that functionality be in the {od} package, easy to be upstreamed?

Possibly yes, I was working here to get things working but file could easily be moved

Robinlovelace commented 2 months ago

This belongs upstream: https://github.com/ITSLeeds/od/issues/18