75839 introduced an optimization to reduce the index calls to only necessary indexes. However the approach has a flaw with respect to the delay configuration. See
The extra optimization is latest got reverted, too. See LatestChangeCollector.
To re-enable the optimization TransformCheckpoint.getChangedIndices must take the timeUpperBoundMillis and delay into account. We probably need not only the last checkpoint, but data from previous checkpoints.
However the underlying root cause is the disconnection between sequence id's and the timestamp field for synchronizing source and dest. If we can base change detection fully on sequence id's, we can re-enable the optimization and do not require extra code. Therefore it might be better to prioritize the switch to sequence id's instead of trying to make this optimization work with the current hybrid.
75839 introduced an optimization to reduce the index calls to only necessary indexes. However the approach has a flaw with respect to the
delay
configuration. Seehttps://github.com/elastic/elasticsearch/pull/77204#issuecomment-914066744
The change has been "soft"-reverted, meaning the optimization is turned off in
TransformIndexer.buildQueryToFindChanges
:The extra optimization is latest got reverted, too. See
LatestChangeCollector
.To re-enable the optimization
TransformCheckpoint.getChangedIndices
must take thetimeUpperBoundMillis
anddelay
into account. We probably need not only the last checkpoint, but data from previous checkpoints.It might make sense to this together with https://github.com/elastic/elasticsearch/pull/75839#issuecomment-905542882.
However the underlying root cause is the disconnection between sequence id's and the timestamp field for synchronizing source and dest. If we can base change detection fully on sequence id's, we can re-enable the optimization and do not require extra code. Therefore it might be better to prioritize the switch to sequence id's instead of trying to make this optimization work with the current hybrid.