Open tmitanitky opened 2 years ago
Pinging @elastic/ml-core (Team:ML)
I think what makes this tricky is the deletion of existing documents, that's why filtering won't work. Filtering will let the old document stay in the index.
That's why I think it requires a Delete By Query, which is what retention_policy
uses internally. However it is hardcoded to the time use case (that's why it is retention_policy.time
). If we expand retention_policy
to take a term query (retention_policy.term
), this use case can be implemented.
Description
The latest transform is a great way to extract the latest documents from the source "history" index. Occasionally, we may want to use
delete_flag
on the history index to perform a logical delete and keep the delete history for a while.Since there is no way to handle logical deletion in the current transform (in fact, it is feasible by tricking with
retention_policy
even now), the logically deleted document will also be indexed in the destination index, and it is necessary to filterdelete_flag: false
every query time.When using
source.query
parameter, the query is conditioned BEFORE transform works, so even if the latest document isdelete_flag: true
, the olderdelete_flag: false
document will be indexed to the destination index.If the function of
delete_flag
is implemented, it will be possible to handle the history index with logical delete well.As API, there might be some options:
Here is the example of the trick.
If we can use
{"delete_flag": true/flase}
field, it is more intuitive and harder to embed bugs, and also it can be used in combination with the originalretention_policy
use.Thanks.