Open james-westwood opened 2 years ago
Currently on on line 51 the uk_lsoa is created:
uk_LSOA_df=di.geo_df_from_geospatialfile(path_to_file=file_path)
-This contains LSOA11CD, LSOA11NM, LSOA11NMW (welsh?) and geometry coordinates. No sign of where LA is yet, must be merged on further along the line, will carry on the debugging to investigate. -The LSOA11NM seem to be in the format <LA Name - Code>. E.g Croydon 044B.
-la_poly gets polygon of LA based in LSOA11NM. -la_stops_geo_df creates a dataframe by subsetting points in la_poly that appear in stops_geo_df
Need the LSOA dataframe as this has the coordinates in them. So the script does the following now:
Summary of Discussion with JW: -Investigate if LA coordinates can be brought in using this file here -Not sure if these are points, lines or polygons. Will have to read in file and investigate. -If they are Polygons then look to include these coordinates instead of LSOA. If they are not, then keep the code and merge changes as is in.
-Have switched from using LSOA coordinates to LA coordinates. Code works fine with these. Same figures between North Devon LSOA and LA's. -Only problem for the future I see is there is a slight mismatch between the whole nation LAD20NM (got by merging on OA with 2020 lookupfile) and the LA coordinates shp file which use LAD21NM.
The task is to filter stops df on each local authority before calculating the next steps.
Currently the code uses LSOA to filter to a Birmingham stops data set by making a LSOA dataframe (with
bham_LSOA_df = uk_LSOA_df[uk_LSOA_df.LSOA11NM.str.contains("Birmingham")]
) and then merging.This was a work around and since we now know the output data has to be reported at Local Authority (LA) level, then we should not go via LSOA. Also LSOA data is not needed in calculations and it would be cleaner to filter on the LA names.
Due to the work in #130, the LA name should be in the population dataframe already. (I do not know what the column name is, but can check later). For the analysis involving stops (buffering them into service areas calculating the served and unserved populations) we will need to filter by local authority.
Filtering and further work can done in two ways, either:
birmingham_stops_geo_df
with a something likestops_geo_df.loc[stops_geo_df["LA"]==region]
whereregion
is the name of the LA that we are processing at the time.I have no strong preference. I would guess that creating lots of smaller dataframes is going to be slower and much more memory intensive, so I think creating the "view" of the dataframe using
.loc
might be the leaner option.Note: the
gs.buffer_points()
operation could be done in one go on the wholestops_geo_df
. If it turns out to be computationally intensive then do the check for feather / write out to feather thing that we do on other steps.