cal-itp / data-analyses

Place for sharing quick reports, and works in progress
https://analysis.calitp.org
26 stars 5 forks source link

Feature Request: Published data products should patch in earlier data if it's missing #1225

Open tiffanychu90 opened 1 day ago

tiffanychu90 commented 1 day ago

Where does your feature apply? Select from the below, and be sure to affix the appropriate label to this issue (e.g. dataset, jupyterhub, metabase, analysis.calitp.org)

Is your feature request related to a problem? Please describe. Our single day snapshots that support our analytics pipeline can be subject to missing operators. This is expected, as day to day, feeds can be missing for a short period and come back soon thereafter. For users, this can prove to be frustrating as operators appear and disappear.

Describe the solution you'd like We'll keep our analytics pipeline as is, pulling the single day and running it through. Except, let's add 2 things to help us fill in the blanks:

Describe alternatives you've considered

We want to consider the following points:

Additional context

edasmalchi commented 1 day ago

Thanks for the thorough writeup! My first impression is that going back to the last cached date would be preferable but happy to help brainstorm more.

Stuff like stops/routes are relatively static, and it seems better to have complete data for an operator minus, perhaps, the most recent service change vs. no/incomplete data... For RT, maybe better to have it go a few months stale vs. running an off-cycle date?

Perhaps as part of this tooling we can add a separate alert/reporting mechanism if we have nothing for an operator for, say, 6mos?

tiffanychu90 commented 1 day ago

@edasmalchi: Ok! let me try to get this for sep open data + yaml produced to track what's there, and we can iterate from there? I'm curious for how many operators / how far back we'll be patching this, but hopefully this means sep's hqta data will definitely have Long Beach