department-for-transport-BODS / bods-data-extractor

A python client for downloading and extracting data from the UK Bus Open Data Service
Other
44 stars 8 forks source link

Incorrect Parsing of timetables (6) #45

Open spencer-b-318 opened 1 year ago

spencer-b-318 commented 1 year ago

Describe the bug Some vehicle journeys appear to be overlapping with duplicate sequence numbers at different stops/times. Could be due to inbound / outbound journeys being displayed on the same column

Expected behaviour Timetables should be split by inbound/outbound and take into account days of operation. Vehicle Journeys and Journey patters are relevant to timetable calculation and should both be displayed in final output to aid analysis and troubleshooting.

Additional context Additional Features to current Iteration

spencer-b-318 commented 1 year ago

Example Timetable

Outbound

Journey Pattern ID JP_9 JP_10
Vehicle Journey vj_1 vj_2
Route ID Rt_1 Rt_2
Days Mon - Fri Sun
Bus Station 09:00
Hospital 09:05
Market 08:10 09:10
University 08:15 09:15

Inbound

Journey Pattern ID JP_9 JP_10
Vehicle Journey vj_1 vj_2
Route ID Rt_1 Rt_2
Days Mon - Fri Sun
University 08:00
Market 08:05
Hospital 07:10 08:10
Bus Station 07:15 08:15
spencer-b-318 commented 1 year ago

@adamakram1

Extract JSON data from the XML

Define python dataclasses to hold the data from the JSON elements

I suggest working in this order:

spencer-b-318 commented 1 year ago

https://jsonformatter.org/xml-viewer

adamakram1 commented 1 year ago

Describe the bug Currently, due to specific PTI logic that is not accounted for, vehicle journeys that overlap midnight (ie start at 23:50 and end at 01:00), are incorrectly represented in the final timetables output.

To Reproduce Steps to reproduce the behavior: Run the below code then find a vehicle journey that overlaps 00:00

#intiate an object instance called my_bus_data_object with desired parameters
from BODSDataExtractor.extractor import TimetableExtractor

my_bus_data_object = TimetableExtractor(api_key=api # Your API Key Here
                                 ,limit=1 # How many datasets to view
                                 ,status = 'published' # Only view published datasets
                                 ,service_line_level=True # True if you require Service line data 
                                 ,stop_level=True # True if you require stop level data
                                 )

#save the extracted stop level data to stop_level variable
stop_level = my_bus_data_object.timetable_dict

#note that in downloading stop level the  data, the dataset and service line level will also be downloaded. Can access this as below:
dataset_level = my_bus_data_object.metadata
service_line_level = my_bus_data_object.service_line_extract

#save meta data and service line level data to csv file in your downloads directory
my_bus_data_object.save_metadata_to_csv()
my_bus_data_object.save_service_line_extract_to_csv()

#stop_level variable is a dictionary of dataframes, which can be saved to csv as follows (saves in downloads directory)
my_bus_data_object.save_all_timetables_to_csv()

Expected behavior These vehicle journeys that overlap midnight should be represented accurately.

Additional context This bug needs investigating more thoroughly before it can be fixed. Suggest reading PTI 1.1 doc on midnight timetables first.

adamakram1 commented 1 year ago

Create an additional Ticket for the next sprint with fewer story points Inform simplifyai- Take an extra sprint to get it into a better place