act-now-coalition / can-scrapers

MIT License
9 stars 13 forks source link

Create dedicated flow for updating NYT data and triggering parquet update #410

Closed smcclure17 closed 2 years ago

smcclure17 commented 2 years ago

Creates a flow to check the NYT data source every 30 minutes, and on new data trigger the update parquet file flow. This follows the workflow outlined in the docs: https://docs.prefect.io/core/idioms/flow-to-flow.html

smcclure17 commented 2 years ago

Thanks for this feedback 🙏, I'll work to revise this into something a bit more clean/clear.

Regarding,

Don't we already have a NYT scraper run scheduled for this time?

Good catch. You're able to add schedules to existing flows in the Prefect dashboard/UI without making any code changes (by going to a specific flow > settings > new schedule), so this is how I added that schedule for the NYT scraper.

We'll want to remove that schedule whenever we merge this PR.

mikelehen commented 2 years ago

Ah! That makes sense. Out of curiosity, do you know if there's any way to see what flows are set up to run on a schedule? Like assuming I came to our dashboard not knowing anything, is there any way I could have found out that the NYT scraper runs on a schedule other than by clicking every single flow and going to its settings? Guessing not (couldn't immediately find something in Google), but thought I'd ask.

smcclure17 commented 2 years ago

I don't think there's anywhere that explicitly lists out the flow schedules, no :(

I think the best way to check is to look in the "Upcoming Runs" section, which lists out the runs that are currently scheduled. Scrolling through here you would be able to see that the NYT scraper is scheduled every day at 1:30 ET, the MainFlows run every 4 hours, etc., but it's not perfect

image