UrbanAnalyst / dodgr

Distances on Directed Graphs in R
https://urbananalyst.github.io/dodgr/
127 stars 16 forks source link

Duplicated deduplication message #236

Closed Robinlovelace closed 3 weeks ago

Robinlovelace commented 1 month ago

In pair programming session with Juan, we were asked to deduplicate graph edges in a helpful message. What we did not expect was to receive the same message after deduplication... Reproducible example that may make more sense that this short report, any questions welcome and heads-up @juanfonsecals1 is making great use of this awesome package!

#| label: setup
#| include: false
if (!require("rem#| otes")) install.packages("remotes")
pkgs = c(
    "sf",
    "tidyverse",
    "osmextract",
    "zonebuilder",
    "dodgr",
    "tmap",
    "webshot2",
    "geodist"
)
remotes::install_cran(pkgs)
sapply(pkgs, require, character.only = TRUE)
tmap_mode("view")
#| label: case-study-area
case_study_area = zonebuilder::zb_zone("Cape Town", n_circles = 4)
m = qtm(case_study_area)
#| eval: false
streetnet = dodgr::dodgr_streetnet(
    sf::st_bbox(case_study_area)
)
# To overcome this issue:
# Error: Overpass query unavailable without internet
assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))
if (!file.exists("streetnet.rds")) {
    streetnet = dodgr::dodgr_streetnet(
        sf::st_bbox(case_study_area)
    )
    saveRDS(streetnet, "streetnet.rds")
}
streetnet = readRDS("streetnet.rds")
#| label: convert-to-graph
graph = dodgr::weight_streetnet(streetnet)
#| label: post-processing
graph = dodgr::dodgr_deduplicate_graph(graph)
graph_contracted = dodgr::dodgr_contract_graph(graph)
nrow(graph_contracted) / nrow(graph) # 1/3rd size
mpadge commented 1 month ago

That does seem odd. The following reproduces what I see, using the internal dodgr:::duplicated_edge_check() function, which is where the message is issued, and which returns TRUE if any edges are duplicated:

library (dodgr)
packageVersion ("dodgr")
#> [1] '0.4.1'

if (!file.exists("streetnet.rds")) {
    case_study_area = zonebuilder::zb_zone("Cape Town", n_circles = 4)
    #> Loading required namespace: tmaptools
    m = tmap::qtm(case_study_area)

    streetnet = dodgr::dodgr_streetnet(
        sf::st_bbox(case_study_area)
    )
    saveRDS(streetnet, "streetnet.rds")
}
streetnet = readRDS("streetnet.rds")
graph = dodgr::weight_streetnet(streetnet)
#> The following highway types are present in data yet lack corresponding weight_profile values: busway, construction, NA, raceway, corridor, bus_stop, elevator,
nrow (graph)
#> [1] 443712
(dodgr:::duplicated_edge_check (graph))
#> Warning in dodgr:::duplicated_edge_check(graph): Graph has duplicated edges. Only the first will be used here,
#> but it is better to remove them first with the 'dodgr_deduplicate_graph() function.
#> [1] TRUE
graph = dodgr::dodgr_deduplicate_graph(graph)
nrow (graph)
#> [1] 443194
(dodgr:::duplicated_edge_check (graph))
#> [1] FALSE

Created on 2024-07-17 with reprex v2.1.1

I'd suspect either:

  1. An older version of dodgr, as there have been a couple of recent improvements in weight_streetnet? Or,
  2. An issue with dodgr caching, which should be fixable either by restarting R session, or clear_dodgr_cache(), or even dodgr_cache_off().

Let me know if that helps

Robinlovelace commented 1 month ago

Thanks for the quick response @mpadge suspect that will fix it, will give it a go!

Robinlovelace commented 1 month ago

Before removing the cache:

> graph_centrality = dodgr::dodgr_centrality(graph_contracted)
Graph has duplicated edges. Only the first will be used here,
but it is better to remove them first with the 'dodgr_deduplicate_graph() function.
Do you want to proceed (y/n)? n
Error: Okay, we'll stop there
> #| label: post-processing
> graph = dodgr::dodgr_deduplicate_graph(graph)
> #| label: centrality-calc
> graph_centrality = dodgr::dodgr_centrality(graph_contracted)
Graph has duplicated edges. Only the first will be used here,
but it is better to remove them first with the 'dodgr_deduplicate_graph() function.
Do you want to proceed (y/n)? n
Error: Okay, we'll stop there
Robinlovelace commented 1 month ago

After removing the cache as suggested, the message persists:

> clear_dodgr_cache()
> graph = dodgr::dodgr_deduplicate_graph(graph)
> graph_contracted = dodgr::dodgr_contract_graph(graph)
> #| label: centrality-calc
> graph_centrality = dodgr::dodgr_centrality(graph_contracted)
Graph has duplicated edges. Only the first will be used here,
but it is better to remove them first with the 'dodgr_deduplicate_graph() function.
Do you want to proceed (y/n)?
Robinlovelace commented 1 month ago

Note: I'm getting this message despite the following:

> (dodgr:::duplicated_edge_check (graph))
[1] FALSE
Robinlovelace commented 1 month ago

Update: it seems that the duplicate edges are being added during the contraction stage:

dodgr:::duplicated_edge_check(graph_contracted)

Which returns TRUE

if that makes sense, or am I missing something?

mpadge commented 1 month ago

Not quite fixed yet ...

mpadge commented 1 month ago

@Robinlovelace @juanfonsecaLS1 Thanks so much for the very useful bug reporting! You should now see this:

library (dodgr)
packageVersion ("dodgr")
#> [1] '0.4.1.17'

path <- "/<path>/<to>/streetnet.rds"
if (!file.exists(path)) {
    case_study_area = zonebuilder::zb_zone("Cape Town", n_circles = 4)
    streetnet = dodgr::dodgr_streetnet(
        sf::st_bbox(case_study_area)
    )
    saveRDS(streetnet, path)
}
streetnet = readRDS(path)

graph = dodgr::weight_streetnet(streetnet)
#> The following highway types are present in data yet lack corresponding weight_profile values: busway, construction, NA, raceway, corridor, bus_stop, elevator,
nrow (graph)
#> [1] 443712
(dodgr:::duplicated_edge_check (graph))
#> Warning in dodgr:::duplicated_edge_check(graph): Graph has duplicated edges. Only the first will be used here,
#> but it is better to remove them first with the 'dodgr_deduplicate_graph() function.
#> [1] TRUE
graph = dodgr::dodgr_deduplicate_graph(graph)
nrow (graph)
#> [1] 443194
(dodgr:::duplicated_edge_check (graph))
#> [1] FALSE

graph_contracted <- dodgr_contract_graph (graph)
(dodgr:::duplicated_edge_check (graph_contracted))
#> [1] FALSE

Created on 2024-08-02 with reprex v2.1.1

mpadge commented 3 weeks ago

Re-opening because of bug arising in https://github.com/ITSLeeds/routingday/issues/14, which appears to be because the re-filling of edge data mapped from full on to contracted graph isn't being done correctly.