elastic / curator

Curator: Tending your Elasticsearch indices
Other
3.04k stars 635 forks source link

Snapshot --delete-older-than uses get_index_time #129

Closed parkerd closed 10 years ago

parkerd commented 10 years ago

--delete-older-than DELETE_OLDER_THAN Delete snapshots older than n TIME_UNITs.

$ curator snapshot --repository example --prefix test --delete-older-than 30
...
2014-07-29 23:25:35,894 ERROR     Could not find a valid timestamp for test_snapshot with timestring %Y.%m.%d
...

I would expect this to delete snapshots older than 30 days with a given prefix. Instead, find_expired_data always uses find_index_time to check timestring in index name, but snapshot names do not necessarily correspond to index names. Instead it would make sense to use end_time on the snapshots themselves.

I can submit a patch if you would like, although it's not clear to me if this is broken or just misleading?

untergeek commented 10 years ago

It's a misunderstanding, to be sure. Did you use curator to create the snapshot you're seeking to delete? Presently, the snapshot functionality in curator is rather static in its approach: One snapshot per day, one index per snapshot, and the snapshot name is the index name.

Effectively, curator's snapshot delete expects the snapshots to be in the format that curator snaps them, and will not delete snapshots made in any other way. That's why it can't delete an index named test_snapshot: it expected a snapshot named testYYYY.MM.dd with the prefix you provided.

The reasoning here is that this is time series data. Once an index is "cold" and no longer actively indexing, it is never likely to change again. So snapshot it, save it in that state—in a snapshot of the same name, containing only that index. My future plans for snapshotting are to allow for two snapshot streams: One just like the current (full, cold indices), and another that can snapshot the current live index or two until they hit that "cold" state. The delete function for the second option would potentially allow for your present use-case, and therefore a more broad ability to delete snapshots by "snapshot" creation date, though that can be of mixed utility because:

When a snapshot is deleted from a repository, Elasticsearch deletes all files that are associated with the deleted snapshot and not used by any other snapshots.

The wiki shows all flags used with the snapshot command, and clearly the --timestring field is there. If you can explain this in a way that is more clear, that would indeed be helpful.

parkerd commented 10 years ago

Alright, I suspected the current target use case might be more narrow than I need. I'm interested in a more general approach of multiple snapshots for a single index that is not date based and cleaning those up based on time.

Thanks for the detail.