HSLdevcom / transitlog

Explore observed public transport and compare with the intended traffic
https://reittiloki.hsl.fi/
Creative Commons Attribution 4.0 International
5 stars 1 forks source link

Service Alerts to TimescaleDB #24

Open paasovaara opened 5 years ago

paasovaara commented 5 years ago

Only the service alerts that contain the lines or stops. Parse from GTFS-RT messages.

Think about search filters.

paasovaara commented 5 years ago

Let's read pulsar topic of ServiceAlerts. We want to store every change to ServiceAlerts so we can get the full change history and the exact service alert message on timestamp N.

Currently the pulsar feed publish the whole state (multiple ServiceAlerts) at the same time (if one changes we publish all so this contains duplicates). We need to filter out the duplicates. Let's have the database engine (postgresql) to do this. we can define constraint (BulletinId, lastModifiedTimestamp/md5) and just define the insert so that on error we ignore the message, no need to assert or throw.

TODO integration test to ServiceAlert publisher that BulletinId comes in the message (as feedentity-id or something else)

samijuhani commented 5 years ago

Please make sure this process works.

  1. Alert made for 9-16 at 8:00
  2. Disruptions end prematurely at 15, change made 15:00
  3. So at 14:59 there was alert 9-16 but at 15:00 same alert is valid only 9-15.
  4. API should not give this alert for 15:00 -> use "last state"
paasovaara commented 5 years ago

GTFS-RT-alert schema allows multiple timeperiods and informed entitites attach to one single service alert. Mapping this in SQL would either require multiple tables and use of joins (not very well supported by Hasura) or use nested json-objects or arrays. Hasura should support jsonb and even be quite performant so we could try that first.

more info: https://blog.hasura.io/postgres-containment-operators-part-2-performance-comparison-with-mongodb-321324a476c2/