dcdourado / ecto_backfiller

A back-pressured backfill executor for Ecto
Apache License 2.0
1 stars 0 forks source link

Offset usage #13

Closed dcdourado closed 1 year ago

dcdourado commented 2 years ago

If the query results are affected by the execution of the backfill's module handle_batch/1 callback, moving the offset would cause some rows to be skipped.

Example:

Query -> users where email_verified = true
Backfill execution -> updates email_verified = false

The incremental offset strategy is not useful the way it is implemented right now for this use case.

We could move the offset to 0 after handle_batch executes, but if there is more than one consumer this could cause rows to be executed twice, forcing the implementation to be idempotent and the whole operation would cost more.

Any suggestions?

dcdourado commented 2 years ago

Maybe the backfiller should only receive the schema struct and then query the whole table sorting by inserted_at ASC, with no custom filters.

dcdourado commented 1 year ago

We can do this with seek method: https://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/