UrbanAnalyst / dodgr

Distances on Directed Graphs in R
https://urbananalyst.github.io/dodgr/
128 stars 16 forks source link

Unexpected behavior in dodgr_flows #243

Open jucardwell opened 2 months ago

jucardwell commented 2 months ago

Hi Mark,

I am trying to figure out an issue I am having in dodgr_flows. I am using dodgr_flows to calculate a "redundancy index" for each block group in North Carolina. I have a flow matrix set up for each block group, which tracks travel from each origin block group to its habitual destinations and it includes a flow approximation to each destination based on population data.

To track redundancy, I have set up a process that iteratively increases the graph$weighted_distance (which I have set to weighted_time) once that path has been utilized in a shortest path.

matrix_data_scenario <- readRDS(file_path)

from <- rownames(matrix_data_scenario)
to <- colnames(matrix_data_scenario)

####Read in Road Data#####
#read in osm data
nc_roads <- dodgr_load_streetnet("/work/users/j/m/jmcard/dodgr_analysis/data/network_data/dodgr_weighted_network_08022023.Rds")
nc_roads <- dodgr_contract_graph(nc_roads)
#get largest component
largest_component <- nc_roads[nc_roads$component == 1, ]
largest_component$variable_time <- largest_component$time_weighted
largest_component$d_weighted <- largest_component$variable_time
#######################

num_scenarios <- 30
  weight_increase_factor <- c(1.25, 1.5, 1.75, 2)

  # Iterate through each weight increase factor
  for (j in 1:length(weight_increase_factor)) {
    baseline_road_set <- vector("list", num_scenarios)
    weight_factor <- weight_increase_factor[j]
    largest_component_scenario <- largest_component

    penalized_roads <- vector() # Store all penalized roads here

    # Iterate through each scenario
    for (k in 1:num_scenarios) {
      # weight adjustment after the first iteration
      if (k > 1) {
        used_roads <- baseline_road_set %>%
          bind_rows(baseline_road_set) %>%
          pull(edge_id) %>%
          unique() %>%
          as.character()

        # Add the newly used roads to the penalized roads, but ensure no road is penalized more than once
        new_penalized_roads <- setdiff(used_roads, penalized_roads)
        penalized_roads <- unique(c(penalized_roads, new_penalized_roads))

        # Adjust the weights only for the newly penalized roads
        largest_component_scenario$variable_time <- ifelse(largest_component_scenario$edge_id %in% new_penalized_roads,
                                                           largest_component_scenario$variable_time * weight_factor,
                                                           largest_component_scenario$variable_time)
        largest_component_scenario$d_weighted <- largest_component_scenario$variable_time
      }

      # Perform dodgr flows aggregate for the scenario
      graph_baseline <- dodgr_flows_aggregate(largest_component_scenario, from = from, to = to, flows = matrix_data_scenario, norm_sums = FALSE, contract = FALSE)

      # Filter and select data
      filtered_data <- graph_baseline %>%
        as_tibble() %>%
        filter(flow > 0) %>%
        select(edge_id, flow, d_weighted) %>%
        mutate(iteration = k, time_weighted_flow = flow * d_weighted)

      # Store Used Roads in the baseline_road_set
      baseline_road_set[[k]] <- filtered_data
    }

    # Combine the roads used across all iterations
    complete_used_roads <- baseline_road_set %>%
      bind_rows()

    # Summarize the time weighted flow by iteration
    summarized_iterations <- complete_used_roads %>%
      group_by(iteration) %>%
      summarise(sum_time_weighted = sum(time_weighted_flow))

    # Save the result to a CSV file
    write_csv(summarized_iterations, paste0("/work/users/j/m/jmcard/redundancy/output/", file_base, "_output_weight_", weight_factor, ".csv"))
  }

In most situations, as I would have expected, the sum of weighted time "levels out" at a maximum of (beginning weighted flow * weight factor), meaning that the flow has been re-directed onto the original shortest path set at some iteration. I am tracking the number of that iteration to compare across block groups.

However, a non-trivial number of block groups (maybe 20%) are exhibiting a "leveling out" below the expected maximum. This seems to indicate to me that there is a shortest path set that is not selected as the original shortest path, or that something else is going on.

For instance, in the output below the sum_time_weighted of the first iteration is 483517.619, so you would expect that at some iteration, it would return to that value * weight value, which would be 604397.024. As you can see below, it is leveling out, but at a value lower than that max. I have occasionally encountered rounding issues with dodgr outputs. I wonder if that could be the issue here (given that any rounding issue is multiplied by the flow on that route), but the magnitude seems a bit large for that.

iteration sum_time_weighted o_fid weight exceedence exceed
1 483517.619 1172 1.25 604397.024 FALSE
2 597567.78 1172 1.25 604397.024 FALSE
3 604030.195 1172 1.25 604397.024 FALSE
4 603065.429 1172 1.25 604397.024 FALSE
5 603145.836 1172 1.25 604397.024 FALSE
6 603145.836 1172 1.25 604397.024 FALSE
7 603145.836 1172 1.25 604397.024 FALSE
8 603145.836 1172 1.25 604397.024 FALSE
9 603145.836 1172 1.25 604397.024 FALSE
10 603145.836 1172 1.25 604397.024 FALSE
11 603145.836 1172 1.25 604397.024 FALSE
12 603145.836 1172 1.25 604397.024 FALSE
13 603145.836 1172 1.25 604397.024 FALSE
14 603145.836 1172 1.25 604397.024 FALSE
15 603145.836 1172 1.25 604397.024 FALSE
16 603145.836 1172 1.25 604397.024 FALSE
17 603145.836 1172 1.25 604397.024 FALSE
18 603145.836 1172 1.25 604397.024 FALSE
19 603145.836 1172 1.25 604397.024 FALSE
20 603145.836 1172 1.25 604397.024 FALSE
21 603145.836 1172 1.25 604397.024 FALSE
22 603145.836 1172 1.25 604397.024 FALSE
23 603145.836 1172 1.25 604397.024 FALSE
24 603145.836 1172 1.25 604397.024 FALSE
25 603145.836 1172 1.25 604397.024 FALSE
26 603145.836 1172 1.25 604397.024 FALSE
27 603145.836 1172 1.25 604397.024 FALSE
28 603145.836 1172 1.25 604397.024 FALSE
29 603145.836 1172 1.25 604397.024 FALSE
30 603145.836 1172 1.25 604397.024 FALSE

Looking for any thoughts/opinions. I realize this use case is a bit convoluted, so if it would be helpful for me to come up with reproducible code, please let me know. Thanks.

jucardwell commented 2 months ago

Planning to also update dodgr and re-run the script. I am still on 0.2.21

jucardwell commented 2 months ago

Looks like I am having the same issue with the updated version as well

mpadge commented 2 months ago

Thanks @jucardwell for the very interesting use case. I must admit, I don't quite understand what you're trying to do, even though it seems like a pretty obviously valuable analysis. I've re-run the code myself on a test network to try to get some insight. The values of sum_time_weighted converge, but to different values for each iteration. I think I'm going to need a bit more help understanding what the expectations are here, and why ...?

One thing that does jump out from your code there is your modification of weightings. I can't really foresee the effect this might have? I guess as the iteration proceeds, the applied increases in weighting will penalize increasingly large portions of the entire network, and re-direct routing onto alternative paths. But then at some stage, this penalty scheme will likely saturate, and thus revert towards the original, un-penalized scheme? I added a line to dump proportions of penalized_roads, which increase with increase in weight_increase_factor from under 30% to 34%. But as said, I'm going to need a bit more info to understand what all this means.

mpadge commented 2 months ago

One thing I did just notice: if you examine diff(summarized_iterations$sum_time_weighted), some values do increase fairly quickly to near your expected values, but then decrease again.

jucardwell commented 2 months ago

Hi Mark, thanks for the response. Essentially, I am trying to test the availability of reasonable alternative routes. I penalize utilized roads during each iteration so that I can try to "force" alternative paths. Once a road has been used in one iteration, it is penalized by a weight factor. My expectation is that, once reasonable alternative routes have been exhausted, the flows will revert back to the initial shortest paths. Tracking the iteration in which this happens gives some insight into the redundancy of the network. Most of the origins revert back to their initial paths within 5-20 iterations and once they have converged, there is no change in later iterations. At that point of reversion, I would expect that the sum_time_weighted would be the sum_time_weighted of the first iteration * weight factor. This would mean that the paths were the same as the original paths, but since they have been penalized, their weighted time is increased. However, in the example I provided, there are instances where the sum_time_weighted converges, but not at the expected value. For instance, in the example provided, the first iteration sum_time_weighted was 483517.619. If the weighting factor was 1.25 we would expect it to converge at some iteration at 604397.024. However, it is converging at a smaller value than that, which is unexpected. Because it is converging, that means that all those paths have already been re-weighted, which would imply that the base sum_time_weighted should have been lower than the selected shortest path.

When simply looking at the iteration in which the values converge, the pattern is exactly what I would think it would look like (more iterations to convergence in more urban areas), but I am worried that the convergence is not occurring at the expected value.

Thanks again

mpadge commented 2 months ago

Don't forget that you've got (generally highly) non-linear relationships with the rest of the network. I mentioned above that my trial covered only about 30% of the network as penalized ways. Even when these are fully penalized, the actual routes taken depend on relative weighted distances/times with respect to all other (non-penalized) ways. So I wouldn't really know what to actually expect as penalties increase for only a small portion of the network.

jucardwell commented 2 months ago

Hi Mark, thanks for the response. I'm not sure I totally understand what you mean about non-linear relationships with the rest of the network. I am running a separate trial for about 2000 discrete origins. Each origin has between 1-20 destinations that are all within a relatively short geographical distance from the origin. My hypothesis is that "reasonable" routes would be exhausted in the beginning iterations after the re-weighting scheme and it would land back on the paths in the first set (meaning that even with the penalty, that path is still the relative shortest). Sorry for taking up so much time, but I think I'm not quite there.

jucardwell commented 2 months ago

I will say that the differences are quite small (for instance, the example is .2%. I've encountered small issues with floating point rounding in dodgr

mpadge commented 2 months ago

I'm still not entirely clear what that code would really do, but maybe this tangential thought might help: Your penalty adjustments are very coarse, and penalize all ways traversed in any prior scenario. More usual approaches to finding alternative paths involve simply switching off single edges. You could also do that by finding the edge with the largest flow in your filtered_data, and then on the next iteration set d_weighted for that to some arbitrarily high value like 1e6 or so? Or select some smaller set of n highest-flow edges to switch off?

jucardwell commented 2 months ago

Hi Mark, basically, I am attempting to apply the methodology in this analysis https://dl.acm.org/doi/abs/10.1145/3357000.3366137 I am specifically looking for diverse paths, which is why I penalize all roads. Basically-- how much path diversity exists before returning to the shortest path. I'm open to trying a different approach, but am specifically trying to model a path penalization approach to create routing diversity

mpadge commented 2 months ago

Cross-linking to related issue #224. @jucardwell The idea of "diverse paths" has always been a primary motivation here (https://github.com/UrbanAnalyst/dodgr/issues/224#issuecomment-2068832839), and something I definitely want to implement and improve. I'll get back to you asap