dcdourado / ecto_backfiller

A back-pressured backfill executor for Ecto
Apache License 2.0
1 stars 0 forks source link

Persist backfill offset #6

Open dcdourado opened 2 years ago

dcdourado commented 2 years ago

If the backfill GenServer gets killed for some reason the query offset will be lost.

dcdourado commented 2 years ago

I'm considering to add a table to keep track of backfills made:

module String.t()
offset non_neg_integer()
finished boolean()
updated_at NaiveDateTime.t()
inserted_at NaiveDateTime.t()
dcdourado commented 1 year ago

The offset would be updated after every query performed by the producer

dcdourado commented 1 year ago

The backfill strategy has been updated to seek method, but still we want to keep track of what we have done (in cases where the backfill is interrupted -- maybe a credential who just got expired). I'm thinking on publishing some metrics of batches execution status (for each handle_batch length, start and result).

The thing is, now the producer query goes really fast .Offset strategy was causing too much delay on queries, from seconds to minutes on higher the offsets (1_000_000, ..., 6_000_000).

This makes us able to add more consumers, the maximum speed is when the query time starts to be the bottleneck of how fast the backfill can go.