ONSdigital / SDG_11.2.1

Analysis for the UN Sustainable Development Goal 11.2.1
https://onsdigital.github.io/SDG_11.2.1/
Apache License 2.0
5 stars 7 forks source link

Scotland Output Showing 100% Availability BUG #374

Closed Antonio-John closed 1 year ago

Antonio-John commented 1 year ago

This is is related to #352 looking at Scotland.

Problem Scotland Showing 100% availability.

Caused by: The line buffd_la_stops_geo_df = gs.buffer_points(la_stops_geo_df) puts 500 metre buffer round all stops to deduce whether people have access to convenient public transport stops.

Within the buffer_points function a buffer off 500 metre is put round bus stops, 1000 meters round trains. However, the high or low capacity column function which adds a column called capacity_type if a stop is bus or tram is currently commented out. I think this is awaiting the timetable data to be read in for Scotland have a highly served stops dataframe.

Antonio-John commented 1 year ago

The filtered dataframe is read in to Scottish script but for some reason when there is a bus the capacity_type is NaN. The tram_metro is also NaN. The train is correctly saying high.

Antonio-John commented 1 year ago

I suspect the above isn't contributing as works fine in main.py & the logic in the function is if the capacity type is low then 500metres. Other 1000metres (that would NaN figures)

Antonio-John commented 1 year ago

This might be an error in all scripts. Here is the stop_geo_df: image There are NaN's in capacity type. When there is an NaN as capacity type, the buffer_points function will give a buffer of a 1000m. (It defaults to 1000m if the capacity type isn't low)

image

Think this is affecting main results unless I am mistaken. Happy to talk through this next week incase I have misunderstood how the stops_geo works.

nkshaw23 commented 1 year ago

Issue is with line 200 where we intersect the PWC and buffered stops. This intersection captures everything (all 3,170 records) where as in QGIS we can see that it should be 2,877. This is why we are seeing 100% coverage.

image

nkshaw23 commented 1 year ago

For some reason we are using a different function for Scotland (points_in_polygons) compared to main and NI (find_points_in_poly)

nkshaw23 commented 1 year ago

When trying with find_points_in_poly - because easting and northing are in both datasets, an easting_left and easting_right etc get created. This causes the function to fail.

image

Assuming we would want methods to match between countries when we merge code, it makes sense to remove easting and northing from one of the datasets.

nkshaw23 commented 1 year ago

For reference, in main they only exist in the la_stops_geo_df and they dont get carried through. We just keep the geometry of the PWC dataset which contains the points as coordinates.

I think we should do the same for Scotland.

image

nkshaw23 commented 1 year ago

Tested for Fife and outputs roughly the same between QGIS and script - see #352 for further details. Will make into a MR next week.

nkshaw23 commented 1 year ago

Code ready but needs some updates from #377 . Might aswell wait until this is merged.