cedadev / search-futures

Future Search Architecture
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Consider using database or elasticsearch indices as prompt for scanning (rather than queues) #166

Open agstephens opened 2 years ago

agstephens commented 2 years ago

Thoughts about how to manage multi-level scanning...

Do we need a database?

Reasons to have a database:

Could ES be the database?

SP suggests that we could use ES queries to tell the item-generator and collection-generator what to scan next.

E.g. get me the latest 1,000 assets that need an item, then work through generating those items.

How to do claims?

AS thought that we might update the claim on a record in ES (to avoid another processing claiming it).

But do we need to do claims?

Maybe not. An alternative would be:

  1. Have 1 controller at each level: asset, item, collection:
    • gets batches of work to be done (based on queries)
    • sends jobs to a queue
  2. Have multiple workers at each level:
    • gets next job from queue, does it
agstephens commented 2 years ago

Previous concerns/questions about pods duplicating work were: