CityofSantaMonica / mds-provider

Python tools for working with MDS Provider data
https://github.com/openmobilityfoundation/mobility-data-specification
MIT License
18 stars 20 forks source link

Drop duplicates before load #64

Closed thekaveman closed 5 years ago

thekaveman commented 5 years ago

Whether or not ON CONFLICT UPDATE is doing anything, incoming duplicate records can cause problems.

This change adds an additional per-processing step to both load_status_changes and load_trips helpers in ProviderDataLoader.

Duplicate status_changes are identified using (provider_id, device_id, event_time).

Duplicate trips are identified using (provider_id, trip_id).