ONSdigital / SDG_11.2.1

Analysis for the UN Sustainable Development Goal 11.2.1
https://onsdigital.github.io/SDG_11.2.1/
Apache License 2.0
5 stars 7 forks source link

Investigate inactive stops in stops.csv #178

Closed james-westwood closed 2 years ago

james-westwood commented 2 years ago

Having re-looked at the stops.csv data recently, I noticed there is a column called Status which indicates if the stop is in use or not.

image

The possible entries are

In our current stops_df (22nd Feb, 2022, running from the master branch) does not contain this column, nor has it been filtered on this basis.

image

So far it has proven difficult to find a definition of Status values.

Overall the task is to remove stops that are not in use from the analysis. This probably means filtering out "inactive" and "pending" but for thoroughness we need to get definitions on the others.

Steps for the developer:

Bear in mind our stops data must have the longitude data and the latitude data.

Antonio-John commented 2 years ago

Hi,

I work within the Office for National Statistics and am using NaPTAN data to conduct some analysis. I had a couple of queries about the data which I was hoping you could answer:

1) The data we currently use is from the NaPTAN website. Is this the most recent data or should we be using the NaPTAN api?

2) If we should be using the API, is there any user guide or documentation that would be helpful for this?

3) There is a column called Status within the stops data. I have looked at the NaPTAN guide for data managers and couldn’t find a definition of the Status field. The unique entries in the Status field appear to be:

Possible Values “”(i.e. it's blank) "active" "inactive" "new" "pending"

Do you have a definition of what the values mean please? This will help us remove stops that are not in use from our analysis. 

Many thanks,

Antonio`

Antonio-John commented 2 years ago

draft email written. Would you mind taking a look to see if it's okay please? @james-westwood @Mark-Simons-ONS Then can send after that

james-westwood commented 2 years ago

Except the formatting it all looks good. You could say who the data analysis is for (the SDG team).

Antonio-John commented 2 years ago

great, have sent email :)

Antonio-John commented 2 years ago

Summary of Email:

Couple of thoughts I had from this email.

Antonio-John commented 2 years ago

Link(https://naptan.app.dft.gov.uk/Reports/frmStopsSummaryReport) which shows last time each LA was updated.

jwestw commented 2 years ago

I suggest going back and asking for those couple of points of clarification. 1) How often is the data published? -> I see this might be answered by your last comment. 2) How can we tell when the data was updated (is there a time stamp or publication date?) > I see this might be answered by your last comment. 3) Could they explain what "exported as if active mean"? Can transport users catch transport from Pending stops or not?

Antonio-John commented 2 years ago

-Yesterday morning I went back and asked about how frequently the data is published. The answer is the timeframes of the data being published isn't consistent. Each LA uploads to the data when and if they need to. The link above shows when a LA last uploaded or changed something in the data. They say a lot of users upload every week but there is the potential that nothing has changed. -I will go back and ask about exported if active.

Antonio-John commented 2 years ago

Conversation between @Antonio-John & @james-westwood 25/02/2022

Antonio-John commented 2 years ago

-Treat stops without a status (i.e blank) as active. -Treat stops that are pending as active as well. Only 11 stops like this in the live dataset as of the 1st March 15:00.

jwestw commented 2 years ago

This has been implemented by #186