Shopify / pitchfork

Other
686 stars 21 forks source link

Explore: Task scheduling API #110

Closed casperisfine closed 5 months ago

casperisfine commented 6 months ago

A not so uncommon optimization technique used by apps, it to use a background thread to regularly poll some data and keep a copy always accessible in memory.

Or similarly, it's not uncommon to have a background thread collect and emit metrics at common interval.

With Pitchfork this isn't ideal because you have to be very careful that all background threads are either shutdown or at a safe point when a worker is promoted or when the mold forks a new child.

Idea

Maybe not for 100% of use case, but I believe that in many cases such threads don't actually need to be in the mold nor workers, and could instead be in a separate process.

They also very often don't need an actual dedicated thread, they simply sleep in a loop to perform some action at regular interval.

As such, if we provided a task scheduling API similar to rufus-scheduler (e.g. every 10s), we could run these tasks on a single thread (or a low number of threads) in a dedicated worker, that's never used for promotion, hence for which forking at random points isn't a concern.

For the tasks that need to expose data to workers, they can use a variety of IPC solutions, such as a local key value store, shared memory (with Raindrops), write into file, sqlite3 database etc etc.

nlicalzi commented 6 months ago

Upvoting that this is important, and happy to help where needed/possible

danmayer commented 6 months ago

yes, this could be very useful for a number of things the caching folk are interested in.

elaineylchan commented 5 months ago

Hi @casperisfine hope you are well, want to just ensure that this ticket is still on your radar aiming towards end of the month? :)

casperisfine commented 5 months ago

that this ticket is still on your radar

Yeah, I just got back last night, I started drafting it in https://github.com/Shopify/pitchfork/pull/121. Needs a bit more testing etc, but the bulk of the feature is there.

casperisfine commented 5 months ago

So I decided to not go all the way to a scheduling API, at least for now, but to implement a simpler "service worker" that's basically an inactive process in which you can spawn threads to do what you desire.