cyipt / actdev

ActDev - Active travel provision and potential in planned and proposed development sites
https://actdev.cyipt.bike
7 stars 3 forks source link

All sites vs individual site data #107

Closed Siequnu closed 3 years ago

Siequnu commented 3 years ago

There's a lot of discrepancy between the data present in the all-sites.geojson, which appears to be the most up to date source, the corresponding all-sites.csv, and then the various .csv files in the individual site folders.

Furthermore, Issue #19 in the UI repo appears to heading towards generating a lot of duplicated data:

Site-level metrics, to be available in site.geojson - currently most up-to-date version of the column names and contents can be found in all-sites.geojson but this will also be made available in the site.geojson and all-sites.csv files:

Within site metrics - tbc but most likely also in all-sites.geojson, all-sites.csv and site.geojson

It might be worth making a decision as to where the data for these sites is stored, and whether it is to be done centrally or at a site-level.

Siequnu commented 3 years ago

This is mentioned in #91 as well.

Might be a good time for a quick check to make sure we aren't duplicating unnecessary data.

Robinlovelace commented 3 years ago

Good point. Assigning Joey and this is a known issue at this point while the data side is in flux.

joeytalbot commented 3 years ago

The latest metrics are in all-sites.geojson. They can be added to the site.geojson files as well, but if you don't need them there they can be omitted to reduce duplication. Robin suggested creating all-sites.csv so that users can access the data more easily.

Robinlovelace commented 3 years ago

As with #102 do check and let us know Patrick. Thanks.

Robinlovelace commented 3 years ago

Heads-up @joeytalbot and @Siequnu I think all the data needed is now in the site level data, it has all the columns plus a few more than the all sites data, as shown below:

site_original = sf::read_sf("data-small/great-kneighton/site.geojson")
> (n1 = names(site_original))
 [1] "site_name"                        "full_name"                        "main_local_authority"             "is_complete"                      "dwellings_when_complete"         
 [6] "planning_url"                     "percent_commute_active_base"      "percent_drive_convertable"        "percent_mapped_drive_convertable" "percent_commute_active_scenario" 
[11] "median_commute_distance"          "distance_to_town"                 "crossing_points"                  "percent_commute_walk_base"        "percent_commute_cycle_base"      
[16] "percent_commute_drive_base"       "percent_commute_bus_base"         "percent_commute_rail_base"        "percent_commute_other_base"       "circuity_fast_cycle"             
[21] "circuity_walk"                    "busyness_fast_cycle"              "in_site_walk_circuity"            "in_site_cycle_circuity"           "in_site_drive_circuity"          
[26] "weightedJobsPTt"                  "weightedJobsCyct"                 "weightedJobsCart"                 "PSPTt"                            "PSCyct"                          
[31] "PSCart"                           "SSPTt"                            "SSCyct"                           "SSCart"                           "FEPTt"                           
[36] "FECyct"                           "FECart"                           "GPPTt"                            "GPCyct"                           "GPCart"                          
[41] "HospPTt"                          "HospCyct"                         "HospCart"                         "FoodPTt"                          "FoodCyct"                        
[46] "FoodCart"                         "TownPTt"                          "TownCyct"                         "TownCart"                         "geometry"                        
> all_sites = sf::read_sf("data-small/all-sites.geojson")
> (n2 = names(all_sites))
 [1] "site_name"                        "full_name"                        "main_local_authority"             "is_complete"                      "dwellings_when_complete"         
 [6] "planning_url"                     "percent_commute_active_base"      "percent_drive_convertable"        "percent_mapped_drive_convertable" "percent_commute_active_scenario" 
[11] "median_commute_distance"          "distance_to_town"                 "crossing_points"                  "percent_commute_walk_base"        "percent_commute_cycle_base"      
[16] "percent_commute_drive_base"       "percent_commute_bus_base"         "percent_commute_rail_base"        "percent_commute_other_base"       "circuity_fast_cycle"             
[21] "circuity_walk"                    "busyness_fast_cycle"              "in_site_walk_circuity"            "in_site_cycle_circuity"           "in_site_drive_circuity"          
[26] "geometry"                        
> setdiff(n2, n1)
character(0)
> n1[!n1 %in% n2]
 [1] "weightedJobsPTt"  "weightedJobsCyct" "weightedJobsCart" "PSPTt"            "PSCyct"           "PSCart"           "SSPTt"            "SSCyct"           "SSCart"           "FEPTt"           
[11] "FECyct"           "FECart"           "GPPTt"            "GPCyct"           "GPCart"           "HospPTt"          "HospCyct"         "HospCart"         "FoodPTt"          "FoodCyct"        
[21] "FoodCart"         "TownPTt"          "TownCyct"         "TownCart"        
> n2[!n2 %in% n1]
character(0)
Robinlovelace commented 3 years ago

Also checked for Allerton Bywater. Can we close this issue or are there outstanding discrepancies?

joeytalbot commented 3 years ago

I have now added one extra column to all-sites.geojson - percent_commute_drive_scenario, as mentioned in the meeting, to include in the key metrics. So sites.geojson will need updating to add in this column too.

I've also created a data dictionary for all-sites.geojson https://github.com/cyipt/actdev/blob/main/data-small/all-sites-geojson-data-dictionary.csv

Robinlovelace commented 3 years ago

percent_commute_drive_scenario, as mentioned in the meeting, to include in the key metrics.

Good work Joey but that additional variable is not showing up for me. Can you reproduce this?

> all_sites = sf::read_sf("data-small/all-sites.geojson")
> (n2 = names(all_sites))
 [1] "site_name"                        "full_name"                        "main_local_authority"             "is_complete"                      "dwellings_when_complete"         
 [6] "planning_url"                     "percent_commute_active_base"      "percent_drive_convertable"        "percent_mapped_drive_convertable" "percent_commute_active_scenario" 
[11] "median_commute_distance"          "distance_to_town"                 "crossing_points"                  "percent_commute_walk_base"        "percent_commute_cycle_base"      
[16] "percent_commute_drive_base"       "percent_commute_bus_base"         "percent_commute_rail_base"        "percent_commute_other_base"       "circuity_fast_cycle"             
[21] "circuity_walk"                    "busyness_fast_cycle"              "in_site_walk_circuity"            "in_site_cycle_circuity"           "in_site_drive_circuity"          
[26] "geometry"                        
> setdiff(n2, n1)
character(0)
> (in_original_not_all = n1[!n1 %in% n2])
 [1] "weightedJobsPTt"  "weightedJobsCyct" "weightedJobsCart" "PSPTt"            "PSCyct"           "PSCart"           "SSPTt"            "SSCyct"           "SSCart"           "FEPTt"           
[11] "FECyct"           "FECart"           "GPPTt"            "GPCyct"           "GPCart"           "HospPTt"          "HospCyct"         "HospCart"         "FoodPTt"          "FoodCyct"        
[21] "FoodCart"         "TownPTt"          "TownCyct"         "TownCart"        
> (extra_vars = n2[!n2 %in% n1])
joeytalbot commented 3 years ago

Sorry I hadn't updated every file yet. Github isn't accepting the git push for some reason.

Robinlovelace commented 3 years ago

OK, let me know when it does and I can pull it down. I've been pushing some files, so maybe do

git checkout .

or create a branch.

joeytalbot commented 3 years ago

I've done it as a pull request now instead.

joeytalbot commented 3 years ago

You should see the new column

Robinlovelace commented 3 years ago

Confirmed, thanks @joeytalbot

>   (n2 = names(all_sites))
 [1] "site_name"                        "full_name"                        "main_local_authority"             "is_complete"                      "dwellings_when_complete"         
 [6] "planning_url"                     "percent_commute_active_base"      "percent_drive_convertable"        "percent_mapped_drive_convertable" "percent_commute_active_scenario" 
[11] "median_commute_distance"          "distance_to_town"                 "crossing_points"                  "percent_commute_walk_base"        "percent_commute_cycle_base"      
[16] "percent_commute_drive_base"       "percent_commute_bus_base"         "percent_commute_rail_base"        "percent_commute_other_base"       "circuity_fast_cycle"             
[21] "circuity_walk"                    "busyness_fast_cycle"              "in_site_walk_circuity"            "in_site_cycle_circuity"           "in_site_drive_circuity"          
[26] "percent_commute_drive_scenario"   "geometry"                        
>   setdiff(n2, n1)
[1] "percent_commute_drive_scenario"