Open gmarouli opened 6 months ago
Pinging @elastic/es-data-management (Team:Data Management)
This is an interesting situation, for example, TSDS don't really have a "write index" because documents are routed to the backing index to which their @timestamp
corresponds. So even deleting an index that is not the latest generation index (which would normally be the write index for a non-TSDS) means that some subset of documents can't be indexed (if their @timestamp
were to route them to that index).
We probably still want to fix this behavior and disallow deleting the latest generation index still, but we should be clear in our docs that deleting any index can still generate the the document timestamp […] is outside of ranges of currently writable indices [[…,…]]
exception for a TSDS.
...means that some subset of documents can't be indexed (if their @timestamp were to route them to that index).
That's a good enough reason to prevent it IMHO... unless there's an easy way for the user to recreate it
...disallow deleting the latest generation index...
I think that for both classic and time-series data streams, it is already not possible to delete the latest generation index. However, for TSDS, (correct me if I'm wrong) it is possible to delete the previous backing indexes that are still writable (i.e. up to the configured index.look_back_time
). This should not be allowed for the same reason it is not allowed to delete the latest generation index.
@dakrone you are right. I will rephrase the ticket to not use the write index terminology.
What about:
That the user cannot delete the backing indices whose timeframe includes "now", this "now" should align with the "now" used when determining the timeframes of the indices.
The idea behind this is that since we expect TSDS to accept current data, we should protect the user from accidentally deleting what we expect to be the most written index.
Thoughts?
I think that for both classic and time-series data streams, it is already not possible to delete the latest generation index. However, for TSDS, (correct me if I'm wrong) it is possible to delete the previous backing indexes that are still writable (i.e. up to the configured index.look_back_time). This should not be allowed for the same reason it is not allowed to delete the latest generation index.
Why not? If a user were to configure a 7d look back time, with a 3-day retention, would we want to prevent them from doing that?
The idea behind this is that since we expect TSDS to accept current data, we should protect the user from accidentally deleting what we expect to be the most written index.
I agree about preventing deletion of the most-written index, as I think it would lead to a poor user experience. I don't know yet whether we should protect all writeable indices from deletion, given that with a maximum 7d lookback that could be a very large number of indices (for high-volume indices rolling over every hour, for instance).
The thinking with DS (whether classic or TS) is that deletions (should) occur by means of retention settings (either via ILM or DS lifecycle). Even though it "could" make sense for the user to manually delete the N oldest indexes (whether writable or not) in order to free storage (or whatever else reason), they could achieve the same result by adjusting their retention settings.
However, deleting an index that is inside the array of indices makes much less sense, and even more so if it is writable. Say you have Index1 (oldest, not writable), Index2 (second oldest, still writable), Index3 (newest index, writable)
, it would not be natural for a user to manually delete Index2
. That's what I think we should prevent.
Say you have
Index1 (oldest, not writable), Index2 (second oldest, still writable), Index3 (newest index, writable)
, it would not be natural for a user to manually delete Index2. That's what I think we should prevent.
Interestingly though, all backing indices in a data stream are writeable (TSDS or not), and a user may or may not be performing writes/updates/deletes to these backing indices (we tell users that need to do updates to do this). I think it's worth us (the team, I mean) discussing whether we want to allow "donut hole" indices, as you mentioned, and how we would do this technically. For instance, it's still perfectly valid in your scenario for a user to delete Index2
because they intend to restore it from a snapshot.
I like your "donut hole" metaphor :-) Your explanation makes sense, indeed (re snapshot restore). I still think this kind of actions should be gated in a way that the user should "confirm" that they really intended to delete that backing index (maybe via query string parameter), much like it's the case with other "costly" APIs where we ask the user to add a specific parameter to confirm their intent.
Interestingly though, all backing indices in a data stream are writeable (TSDS or not),
Circling back to this and looking at the documentation for Data stream lifecycle, in step 3 I can read that the write index that's been rolled over is automatically tail merged.
I'm curious to know whether this tail merging process happens only once after rollover or whether there is some kind of write detection mechanism that will rerun the tail merge again after "some" write operations?
The thinking being that since those indexes are not supposed to be written to anymore, they are tail-merged for optimization's sake, but if they are being written to again after that merge process, they are potentially back into a sub-optimal state.
I'm curious to know whether this tail merging process happens only once after rollover or whether there is some kind of write detection mechanism that will rerun the tail merge again after "some" write operations?
It happens only once, then the index has an internal flag set to avoid it being rerun in the future.
The thinking being that since those indexes are not supposed to be written to anymore, they are tail-merged for optimization's sake, but if they are being written to again after that merge process, they are potentially back into a sub-optimal state.
Looking at the code, it appears that we do not wait for it to exit the write "window" before force merging it, so older documents could be written during this time. I'll open an issue for us to change this.
Looking at the code, it appears that we do not wait for it to exit the write "window" before force merging it, so older documents could be written during this time. I'll open an issue for us to change this.
I believe we are waiting for the backing indices to be outside the write bounds here https://github.com/elastic/elasticsearch/blob/main/modules/data-streams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java#L358 by excluding these indices from the run.
A backing index is considered still within its time bounds (so excluded from merging, downsampling, or deleting) if now <= index.time_series.end_time
. Every non-write backing index will have index.time_series.end_time
configured.
Looking at the code, it appears that we do not wait for it to exit the write "window" before force merging it, so older documents could be written during this time. I'll open an issue for us to change this.
I believe we are waiting for the backing indices to be outside the write bounds here https://github.com/elastic/elasticsearch/blob/main/modules/data-streams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java#L358 by excluding these indices from the run.
A backing index is considered still within its time bounds (so excluded from merging, downsampling, or deleting) if
now <= index.time_series.end_time
. Every non-write backing index will haveindex.time_series.end_time
configured.
Just a clarification since @andreidan and I were talking offline about this. His comment means that DSL won't delete indexes per the retention policy until their end-date is past. But the issue @gmarouli raised about being able to manually delete those indexes is still valid. That's done by MetadataDeleteIndexService
which won't delete the one identified as the write index (see here) but doesn't check the end date.
(I'm too new to all this to have a view on the correct behaviour, just noting the current behaviour as I understand it.)
Elasticsearch Version
8.13
Installed Plugins
No response
Java Version
bundled
OS Version
not relevant
Problem Description
Expected behaviour
As a user I should not be able to delete the write index of a data stream, so I can always be able to write to it.
Current behaviour
In the case of a TSDS data stream, there is a period right after a rollover during which the user can still write on the just rolled over index. However, now it's possible for a user to delete this write index because it's not the last one and then all the writes will fail until the newer index becomes the write index.
Steps to Reproduce
Create a tsds data stream
Execute a rollover
Try to index again
Now we have two indices [
xxx-000001
,xxx-000002
]. The following document will end up in the first indexxxx-000001
.Try to delete
xxx-000001
Try to index again
This time indexing the document fails because the correct write index has been deleted.
Logs (if relevant)
No response