CityOfLosAngeles / aqueduct

A shared pipeline for building ETLs and batch jobs that we run at the City of LA for Data Science Projects. Built on Apache Airflow & Civis Platform
Apache License 2.0
21 stars 6 forks source link

It's possible that the dockless ETL task is missing trips #46

Closed ezheidtmann closed 5 years ago

ezheidtmann commented 5 years ago

Hey there,

I've been tracking down an issue with MDS 0.2.x and just wanted to make sure you were aware that non-overlapping query windows on the MDS trips endpoint will result in missed trips. The dockless_elt task in this repo appears to do polling on 12 hour windows; if it's only run every 12 hours then there could be trouble. If the task is running twice as often (every 6 hours), you're probably fine.

I shared an example of the problem in the mds-provider-services repo: https://github.com/CityofSantaMonica/mds-provider-services/issues/20#issuecomment-447015877

The underlying issue is already fixed in MDS 0.3.x; just wanted to make sure that other consumers are compensating appropriately until all providers update to 0.3.x or provide other workarounds.

All the best from your neighbor to the north, Evan

hunterowens commented 5 years ago

@ezheidtmann

this task is run hourly, as you can see in the DAG. Closing this issue for now. Thanks for the heads up.

ezheidtmann commented 5 years ago

👍 I couldn't find the schedule when I looked for it, but I see it now! Glad you're not affected.