Closed Siequnu closed 3 years ago
This is mentioned in #91 as well.
Might be a good time for a quick check to make sure we aren't duplicating unnecessary data.
Good point. Assigning Joey and this is a known issue at this point while the data side is in flux.
The latest metrics are in all-sites.geojson. They can be added to the site.geojson files as well, but if you don't need them there they can be omitted to reduce duplication. Robin suggested creating all-sites.csv so that users can access the data more easily.
As with #102 do check and let us know Patrick. Thanks.
Heads-up @joeytalbot and @Siequnu I think all the data needed is now in the site level data, it has all the columns plus a few more than the all sites data, as shown below:
site_original = sf::read_sf("data-small/great-kneighton/site.geojson")
> (n1 = names(site_original))
[1] "site_name" "full_name" "main_local_authority" "is_complete" "dwellings_when_complete"
[6] "planning_url" "percent_commute_active_base" "percent_drive_convertable" "percent_mapped_drive_convertable" "percent_commute_active_scenario"
[11] "median_commute_distance" "distance_to_town" "crossing_points" "percent_commute_walk_base" "percent_commute_cycle_base"
[16] "percent_commute_drive_base" "percent_commute_bus_base" "percent_commute_rail_base" "percent_commute_other_base" "circuity_fast_cycle"
[21] "circuity_walk" "busyness_fast_cycle" "in_site_walk_circuity" "in_site_cycle_circuity" "in_site_drive_circuity"
[26] "weightedJobsPTt" "weightedJobsCyct" "weightedJobsCart" "PSPTt" "PSCyct"
[31] "PSCart" "SSPTt" "SSCyct" "SSCart" "FEPTt"
[36] "FECyct" "FECart" "GPPTt" "GPCyct" "GPCart"
[41] "HospPTt" "HospCyct" "HospCart" "FoodPTt" "FoodCyct"
[46] "FoodCart" "TownPTt" "TownCyct" "TownCart" "geometry"
> all_sites = sf::read_sf("data-small/all-sites.geojson")
> (n2 = names(all_sites))
[1] "site_name" "full_name" "main_local_authority" "is_complete" "dwellings_when_complete"
[6] "planning_url" "percent_commute_active_base" "percent_drive_convertable" "percent_mapped_drive_convertable" "percent_commute_active_scenario"
[11] "median_commute_distance" "distance_to_town" "crossing_points" "percent_commute_walk_base" "percent_commute_cycle_base"
[16] "percent_commute_drive_base" "percent_commute_bus_base" "percent_commute_rail_base" "percent_commute_other_base" "circuity_fast_cycle"
[21] "circuity_walk" "busyness_fast_cycle" "in_site_walk_circuity" "in_site_cycle_circuity" "in_site_drive_circuity"
[26] "geometry"
> setdiff(n2, n1)
character(0)
> n1[!n1 %in% n2]
[1] "weightedJobsPTt" "weightedJobsCyct" "weightedJobsCart" "PSPTt" "PSCyct" "PSCart" "SSPTt" "SSCyct" "SSCart" "FEPTt"
[11] "FECyct" "FECart" "GPPTt" "GPCyct" "GPCart" "HospPTt" "HospCyct" "HospCart" "FoodPTt" "FoodCyct"
[21] "FoodCart" "TownPTt" "TownCyct" "TownCart"
> n2[!n2 %in% n1]
character(0)
Also checked for Allerton Bywater. Can we close this issue or are there outstanding discrepancies?
I have now added one extra column to all-sites.geojson
- percent_commute_drive_scenario, as mentioned in the meeting, to include in the key metrics. So sites.geojson
will need updating to add in this column too.
I've also created a data dictionary for all-sites.geojson
https://github.com/cyipt/actdev/blob/main/data-small/all-sites-geojson-data-dictionary.csv
percent_commute_drive_scenario, as mentioned in the meeting, to include in the key metrics.
Good work Joey but that additional variable is not showing up for me. Can you reproduce this?
> all_sites = sf::read_sf("data-small/all-sites.geojson")
> (n2 = names(all_sites))
[1] "site_name" "full_name" "main_local_authority" "is_complete" "dwellings_when_complete"
[6] "planning_url" "percent_commute_active_base" "percent_drive_convertable" "percent_mapped_drive_convertable" "percent_commute_active_scenario"
[11] "median_commute_distance" "distance_to_town" "crossing_points" "percent_commute_walk_base" "percent_commute_cycle_base"
[16] "percent_commute_drive_base" "percent_commute_bus_base" "percent_commute_rail_base" "percent_commute_other_base" "circuity_fast_cycle"
[21] "circuity_walk" "busyness_fast_cycle" "in_site_walk_circuity" "in_site_cycle_circuity" "in_site_drive_circuity"
[26] "geometry"
> setdiff(n2, n1)
character(0)
> (in_original_not_all = n1[!n1 %in% n2])
[1] "weightedJobsPTt" "weightedJobsCyct" "weightedJobsCart" "PSPTt" "PSCyct" "PSCart" "SSPTt" "SSCyct" "SSCart" "FEPTt"
[11] "FECyct" "FECart" "GPPTt" "GPCyct" "GPCart" "HospPTt" "HospCyct" "HospCart" "FoodPTt" "FoodCyct"
[21] "FoodCart" "TownPTt" "TownCyct" "TownCart"
> (extra_vars = n2[!n2 %in% n1])
Sorry I hadn't updated every file yet. Github isn't accepting the git push for some reason.
OK, let me know when it does and I can pull it down. I've been pushing some files, so maybe do
git checkout .
or create a branch.
I've done it as a pull request now instead.
You should see the new column
Confirmed, thanks @joeytalbot
> (n2 = names(all_sites))
[1] "site_name" "full_name" "main_local_authority" "is_complete" "dwellings_when_complete"
[6] "planning_url" "percent_commute_active_base" "percent_drive_convertable" "percent_mapped_drive_convertable" "percent_commute_active_scenario"
[11] "median_commute_distance" "distance_to_town" "crossing_points" "percent_commute_walk_base" "percent_commute_cycle_base"
[16] "percent_commute_drive_base" "percent_commute_bus_base" "percent_commute_rail_base" "percent_commute_other_base" "circuity_fast_cycle"
[21] "circuity_walk" "busyness_fast_cycle" "in_site_walk_circuity" "in_site_cycle_circuity" "in_site_drive_circuity"
[26] "percent_commute_drive_scenario" "geometry"
> setdiff(n2, n1)
[1] "percent_commute_drive_scenario"
There's a lot of discrepancy between the data present in the all-sites.geojson, which appears to be the most up to date source, the corresponding all-sites.csv, and then the various .csv files in the individual site folders.
Furthermore, Issue #19 in the UI repo appears to heading towards generating a lot of duplicated data:
It might be worth making a decision as to where the data for these sites is stored, and whether it is to be done centrally or at a site-level.