apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.28k stars 1.03k forks source link

couch changes deadline #5224

Open ronag opened 2 months ago

ronag commented 2 months ago

Currently when using changes with a filter it can take a very long time before a change is received which makes it difficult to create checkpoints to start from in case the change feed is prematurely restarted.

e.g. if you have a huge database with and a very specific change feed selector it can take 8 hours before the first change arrives. If the service has a tendency (for whatever reason) to restart once an hour it will never be able to make progress since no checkpoint ever occurs.

It would be nice if "normal" change feeds would have an option that says "if x duration has passed" then end the change feed and return the last progress equence number.

nickva commented 2 months ago

Yeah agree. This is a long-standing issue. Another idea might be emit periodic markers (a row without a doc_id, or empty doc_id or something) with the current update sequence, so a checkpointer can checkpoint intermediate results. We already emit a no_pass ping from the node workers to the coordinator to keep the stream alive, we just don't emit it to the users.

A timeout might work as well, but there is a danger there if we don't differentiate between a timeout expiring and finishing streaming. It could lead to a sort of a data loss then, as users would consider all the data streamed when in fact we just timed-out early.