Closed Siequnu closed 3 years ago
Woops. Yes will take a look.
There's also an issue for Allerton Bywater:
Doing a sanity test just now on data from Allerton Bywater and it seems OK:
> (shift_totals = dutch_mode_totals - base_mode_totals)
walk_godutch cycle_godutch drive_godutch
13 68 -81
> sum(shift_totals)
[1] 0
> base_mode_totals = desire_lines_final %>%
+ sf::st_drop_geometry() %>%
+ select(matches("base")) %>%
+ select(-matches("all|tri")) %>%
+ colSums()
> dutch_mode_totals = desire_lines_final %>%
+ sf::st_drop_geometry() %>%
+ select(matches("dutch")) %>%
+ select(-matches("all|tri")) %>%
+ colSums()
>
> (shift_totals = dutch_mode_totals - base_mode_totals)
walk_godutch cycle_godutch drive_godutch
13 68 -81
> sum(shift_totals)
[1] 0
Reproducible script suggests that the data in the desire-lines-many.geojson
file is OK:
# Aim: check scenario results for https://github.com/cyipt/actdev/issues/129
library(tidyverse)
f = "https://github.com/cyipt/actdev/raw/main/data-small/allerton-bywater/desire-lines-many.geojson"
desire_lines_final = sf::read_sf(f)
desire_lines_final
#> Simple feature collection with 30 features and 12 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: -1.55257 ymin: 53.72459 xmax: -1.349795 ymax: 53.802
#> geographic CRS: WGS 84
#> # A tibble: 30 x 13
#> geo_code1 geo_code2 purpose all_base trimode_base walk_base cycle_base
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 allerton-bywater E02002398 commute 2 2 0 0
#> 2 allerton-bywater E02002399 commute 7 6 0 0
#> 3 allerton-bywater E02002402 commute 6 4 0 0
#> 4 allerton-bywater E02002402 commute 6 5 0 0
#> 5 allerton-bywater E02002403 commute 8 5 0 0
#> 6 allerton-bywater E02002403 commute 8 7 1 0
#> 7 allerton-bywater E02002404 commute 4 2 0 0
#> 8 allerton-bywater E02002416 commute 7 5 0 0
#> 9 allerton-bywater E02002416 commute 8 6 1 0
#> 10 allerton-bywater E02002417 commute 6 5 1 0
#> # … with 20 more rows, and 6 more variables: drive_base <dbl>, length <dbl>,
#> # walk_godutch <dbl>, cycle_godutch <dbl>, drive_godutch <dbl>,
#> # geometry <LINESTRING [°]>
base_mode_totals = desire_lines_final %>%
sf::st_drop_geometry() %>%
select(matches("base")) %>%
select(-matches("all|tri")) %>%
colSums()
dutch_mode_totals = desire_lines_final %>%
sf::st_drop_geometry() %>%
select(matches("dutch")) %>%
select(-matches("all|tri")) %>%
colSums()
base_mode_totals
#> walk_base cycle_base drive_base
#> 32 4 268
dutch_mode_totals
#> walk_godutch cycle_godutch drive_godutch
#> 45 72 187
(shift_totals = dutch_mode_totals - base_mode_totals)
#> walk_godutch cycle_godutch drive_godutch
#> 13 68 -81
if(sum(shift_totals) != 0) stop("Mode totals do not add up.")
min_vals = sapply(desire_lines_final %>% sf::st_drop_geometry() %>% select_if(is.numeric), min)
if(any(min_vals < 0)) stop("Negative values detected")
Created on 2021-03-07 by the reprex package (v1.0.0)
Raising the question: how are those graphs calculated @Siequnu ? It would help fix the data if you could point to the data in this repo that is spurious or at least the code in the UI repo that reads-it in. Clear something is amiss.
My guess: the graph can best be fixed with code in the UI codebase. Here is what I think caused it.
The attributes were as follows:
Now the columns appear in different positions, with the unwanted 'pdrive' variable gone:
Source: https://github.com/cyipt/actdev/blob/main/data-small/allerton-bywater/desire-lines-many.geojson
This could be fixed on the data side, by adding in the previous column. But I suggest it's worth fixing it on the UI side, using variable names not positions as the basis of the values.
Caveat I'm not 100% sure that is the cause, just a hunch based on the quick diagnostic checks that can help identify the cause of issues like this. @mvl22 and @Siequnu let me know: have I missed something?
Just assigned everyone as it may be a team effort to resolve this one. Priority issue.
The chart is simply generated by parsing the mode-split.csv for each region.
In this case, Great-Kneighton (among others) gained some really odd values after this commit yesterday.
This could be fixed on the data side, by adding in the previous column. But I suggest it's worth fixing it on the UI side, using variable names not positions as the basis of the values.
Taking a close look at the example above, it appears that — among others — the Great Kneighton 6-10 band gained several spurious numbers. Here's how they match up — I'm comparing the pre- and post-commit 6-10 band data for Great Kneighton. Here's the link to the .csv file.
6-10, 6-10 <— the band
10, 10
212.23671, 212
161.16725300000002, 161
4.642678, 5
37.804664, 38
118.719909, 119
51.06945699999997, 51
1.326479, 5 <— !? This is where things start to go a bit 🍐-shaped!
67, 2470 <— 2470!? the corresponding key for this is cycle_goactive
89.326479, 3026*
53, 1153 *
2, 2
18, 18
56, 56
24, 24
1, 1
32, 1165 *
42, 1427 *
25, 544 *
I've added * to places where massive numbers have suddenly appeared.
As there are no numbers in the original that are as large as the new spurious data, it wouldn't follow that this is a change in order — in any case a change in order would have no effect, as I avoid using implied indexes by default.
It appears that in some cases, a lot more than rounding has happened? But in the CSV file the header keys remain the same and in the same order.
So parsing the file and querying the key cycle_goactive for this band (6-10), as I do, will return 2470 as opposed to 67 which is what it used to be before this commit.
@Robinlovelace I've shut the actdev-ui issue you opened as this definitely seems to a data issue, and can't be resolved by UI code.
But I suggest it's worth fixing it on the UI side, using variable names not positions as the basis of the values.
Perhaps norms are different in the R world, but frankly I would regard use of index positions, which are inherently unstable, when names exist, as pretty sloppy work in UI development. UIs need to be defensively coded, so we avoid that kind of thing.
Since we are on the topic, what do the percent fields mean? We aren't currently using these for data visualisation, but in the screenshot attached, almost none of the percentages for base and goactive actually add up to 100%. Data from the great-kneighton mode-split.csv file, but many inconsistencies of this kind throughout the mode-split sites. Some percentage blocks like the 3-6 band on go_active actually add up to 3138%?
Agreed, this is an issue with the mode split csv file and it looks like a bug introduced by @joeytalbot here: https://github.com/cyipt/actdev/commit/7d84d4cfc2359aea6097f3b75c820ba2129430f5
Good detective work. Will chat with Joey and fix it ASAP.
Discussed now with @joeytalbot who is looking into it.
We've discovered the problem. It wasn't the rounding. But in the same commit I regenerated the mode split files, based on the latest updates to all the other scripts.
The recently introduced disaggregation code caused many of the origin and destination geo_codes to change, and this is what was causing the problem.
I think we've fixed this now, I'm just checking through to make sure.
Since we are on the topic, what do the percent fields mean? We aren't currently using these for data visualisation, but in the screenshot attached, almost none of the percentages for base and goactive actually add up to 100%. Data from the great-kneighton mode-split.csv file, but many inconsistencies of this kind throughout the mode-split sites. Some percentage blocks like the 3-6 band on go_active actually add up to 3138%?
The percentages should add up to something close to 100%. But some still seem to be a few percent off.
The new data seems to produce better results but it still seems like there are issues to me so re-opening.
Can you double check @joeytalbot ?
The GIF below looks better than the GIF above for the same place (Allerton Bywater) but still looks off to me.
Heads-up @joeytalbot I've just double checked the input desire line data and cannot see any issue with this so guess it must be your mode split code. Let me know, happy to chat about solutions.
Reproducible example showing it's an issue with the mode-split.csv
file:
I'm having a look
Cheers Joey.
This is now fixed in a pull request. https://github.com/cyipt/actdev/pull/139 The desire line disaggregation resulted in changes to the lengths of desire lines, and this is what led to the problem with the mode split charts.
Rounding errors will still mean some percentages add up to 101% or 99%, but I think that's fine.
Looking good, this is fixed. But should walking be going down in Go Active? This may be a problem with our uptake model code.
The development appears to be gaining several thousand new commuters in Go Active mode?