dystonse / dystonse-gtfs-data

Read gtfs and gtfs-realtime data, write them into a database, perform and export analyses and make predictions based on them.
9 stars 2 forks source link

Different precision of the same prediction when viewed as stop page / trip page #6

Open lenaschimmel opened 4 years ago

lenaschimmel commented 4 years ago

For some trips, a high-quality prediction is shown on the stop page (e.g. "E/S") but on the matching trip page, a prediction with lower precision (e.g. "P/S-") is shown.

High precision:

Screenshot 1

Low precision:

Screenshot 2

We don't know yet when/why it is happening.

lenaschimmel commented 4 years ago

We probably found the reason for this bug:

Normally the primary key of the predictions table should prevent multiple db rows for the same departure event. Our insert or replace is based on this assumption. However, the primary key contains both the route_id and trip_id column.

For the VBN data source, we often get schedule updates where the same actual trip keeps the same route_id, but get another trip_id. We make a scheduled prediction with the old trip_id, and a week later, we make a real time prediction with the new trip_id.

On the stop page, we use the local is_duplicate function inside generate_stop_page to filter out the old, schedule based prediction.

On the trip page, we use get_prediction_for_first_line and already filter by source, event_type, stop_sequence, trip_id, trip_start_date and trip_start_time. Here, we can't catch the row with the deviating trip_id.

Long story short / TLDR

This could be solved in the monitor module, but it might be easier / cleaner to catch it in the predictor, or actually in the importer, since predictions are made during import.

This could be a separate fix, but would also be done if #10 was implemented.

lenaschimmel commented 4 years ago

Now that #10 is done, this issue persists. We could verify that for practically all buses in Brunswick, we get predictions with matching scheduled times and different trip_ids. We still believe that this occurs when a new version of the schedule is downloaded.