ONSdigital / SDG_11.2.1

Analysis for the UN Sustainable Development Goal 11.2.1
https://onsdigital.github.io/SDG_11.2.1/
Apache License 2.0
5 stars 7 forks source link

Filter stops df on each local authority #138

Open james-westwood opened 2 years ago

james-westwood commented 2 years ago

The task is to filter stops df on each local authority before calculating the next steps.

Currently the code uses LSOA to filter to a Birmingham stops data set by making a LSOA dataframe (with bham_LSOA_df = uk_LSOA_df[uk_LSOA_df.LSOA11NM.str.contains("Birmingham")] ) and then merging.

This was a work around and since we now know the output data has to be reported at Local Authority (LA) level, then we should not go via LSOA. Also LSOA data is not needed in calculations and it would be cleaner to filter on the LA names.

Due to the work in #130, the LA name should be in the population dataframe already. (I do not know what the column name is, but can check later). For the analysis involving stops (buffering them into service areas calculating the served and unserved populations) we will need to filter by local authority.

Filtering and further work can done in two ways, either:

I have no strong preference. I would guess that creating lots of smaller dataframes is going to be slower and much more memory intensive, so I think creating the "view" of the dataframe using .loc might be the leaner option.

Note: the gs.buffer_points() operation could be done in one go on the whole stops_geo_df. If it turns out to be computationally intensive then do the check for feather / write out to feather thing that we do on other steps.

Antonio-John commented 2 years ago

Currently on on line 51 the uk_lsoa is created:

uk_LSOA_df=di.geo_df_from_geospatialfile(path_to_file=file_path)

-This contains LSOA11CD, LSOA11NM, LSOA11NMW (welsh?) and geometry coordinates. No sign of where LA is yet, must be merged on further along the line, will carry on the debugging to investigate. -The LSOA11NM seem to be in the format <LA Name - Code>. E.g Croydon 044B.

Antonio-John commented 2 years ago

-la_poly gets polygon of LA based in LSOA11NM. -la_stops_geo_df creates a dataframe by subsetting points in la_poly that appear in stops_geo_df

Antonio-John commented 2 years ago

Need the LSOA dataframe as this has the coordinates in them. So the script does the following now:

  1. Needs ```uk_LSOA_df``` to create the ```la_poly``` which is a big polygon round a local authority
  2. Creates ```la_stops_geo_df```` which is a geo dataframe of only stops within the local authority
  3. ```la_pop_df```` is created which is a right merge of ```uk_LSOA_df``` and the ```whole_nation_pop``` so it only has LSOA's which are present in the current local authourity you are iterating though
  4. This ```la_pop_df```` is then subset with ``` la_pop_df=la_pop_df.loc[la_pop_df["LAD20NM"]==local_auth] ```
Antonio-John commented 2 years ago

Summary of Discussion with JW: -Investigate if LA coordinates can be brought in using this file here -Not sure if these are points, lines or polygons. Will have to read in file and investigate. -If they are Polygons then look to include these coordinates instead of LSOA. If they are not, then keep the code and merge changes as is in.

Antonio-John commented 2 years ago

-Have switched from using LSOA coordinates to LA coordinates. Code works fine with these. Same figures between North Devon LSOA and LA's. -Only problem for the future I see is there is a slight mismatch between the whole nation LAD20NM (got by merging on OA with 2020 lookupfile) and the LA coordinates shp file which use LAD21NM.