datasette / datasette-enrichments

Tools for running enrichments against data stored in Datasette
https://enrichments.datasette.io
Apache License 2.0
16 stars 0 forks source link

Support external enrichments via a JSON API #4

Closed simonw closed 7 months ago

simonw commented 2 years ago

Originally posted by @simonw in https://github.com/simonw/datasette-enrichments/issues/1#issuecomment-1034384356

simonw commented 2 years ago

This relates strongly to the DB schema in #2.

The API exists so that enrichments running in a separate process - or even on something like a GitHub Actions scheduled cron - can fetch new items to work on, mark them as "in process" so no other process gets them, perform the work and then submit the result back to the server.

This can work for human-required enrichments too, hence the locking mechanism is particularly important.

simonw commented 2 years ago

I think the initial "get me a task" request should support more than one enrichment types - so you can, as a client, say "I'm capable of performing these enrichments, what have you got for me?"

I might try to borrow terminology here from a queue system.

simonw commented 2 years ago

Resque uses language enqueue, reserve and process/perform

simonw commented 8 months ago

For the first version I don't think I need a mechanism for claiming tasks - I can assume people will be running just one worker for each task type.

Actually it would still be good to reserve a specific job even if I'm not reserving indifigusl rows within that job.

simonw commented 8 months ago

To implement that I could add a worker_id column to the table which indicates if a worker has reserved the job.

Plus maybe a last_update_at column to help spot workers that may have crashed.

simonw commented 7 months ago

I can ship a first alpha with just in-process enrichments. This can come later.

simonw commented 7 months ago

I've started to have second thoughts about this entirely. In-process enrichments are working great so far, and there's nothing to stop one of those from using some other custom mechanism to queue up enrichment work for an out-of-process task - writing to an external message queue for example.

So for the moment I'm going to leave this feature out. I may add it in the future if there is definite need for it, but I'd like to build a few plugins that do this their own way first to identify the right patterns for it.