cityindex-attic / logsearch

[unmaintained] A development environment for ELK
Apache License 2.0
24 stars 8 forks source link

Contemplate automated index purging options #328

Closed sopel closed 10 years ago

sopel commented 10 years ago

We used to facilitate the logsearch-purge-bot to prune indices in a regular fashion - this project is going to be deprecated though, hence we need to find other options down the road (right now manual purging via ElasticHQ is trivial, but this obviously neither scales nor monitors properly).

As mentioned in https://github.com/cityindex/logsearch-purge-bot/issues/8 already, the Elasticsearch curator might offer everything we need and more eventually:

The Python based Elasticsearch cluster curator has seen some love and exposure recently, see Curator: Tending your time-series indices for an introduction.

It might be useful to scout the overlapping and additional functionality for ways to improve the use case at hand, or even replacing the C# version with a Python bot down the road.

  • :exclamation: the curator code isn't exactly in a shape appropriate for a reusable Python component just yet (i.e. importing the business logic into a Python bot for example), but I'd expect it will get there eventually, the business logic itself is remarkably short/simple.
sopel commented 10 years ago

We briefly discussed this in yesterday's hangout already and @mrdavidlaing mentioned/maintains the built in _ttl as an alternative:

Architecturally I’m still in favour of storing a _ttl at the document level; and then letting elasticsearch clean itself up.

Specifying how long you want the document to exist at the point of shipping it in makes a lot of sense to me

With a default expiry applied to those documents which haven’t chosen an explicit ttl

We should revisit this accordingly.

mrdavidlaing commented 10 years ago

It feels like a very similar decision to including the timezone when shipping in a log with the log, rather than having something external that goes and adds the timezone to a log after it has been shipped.

Anyway, the elasticsearch _ttl functionality is described here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html

mrdavidlaing commented 10 years ago

@dpb587 has convinced me that:

  1. Shippers should only concern themselves with shipping logs; and specifying "immutable facts" about said logs (eg, their format and timezone)
  2. Purging logs is one of the techniques for maintaining cluster performance. Keeping purge logic in a curator component (deployed alongside the es nodes) enables the cluster admins to drive purging when tuning cluster performance.

Thus, my opinion has been changed to favour the curator approach over the document _ttl approach.

sopel commented 10 years ago

:information_source: Curator seems to receive the expected maturity treatments and is meanwhile available as a PyPi package at least, thereby easing consistent deployment.

dpb587 commented 10 years ago

I've used curator -d 29 successfully on the main cluster the past two Tuesdays now. We could make it a Jenkins job, but like our other tasks it requires an SSH tunnel which requires a lot of extra bootstrapping to what should be a very simple one-liner. I wish there was another way...

dpb587 commented 10 years ago

Add it as a cron job on the elasticsearch master node...

dpb587 commented 10 years ago

This issue is assigned to me and, after reviewing it, I'm closing it because this problem is solved by curator and is something we won't be pursuing within this repository since migrating to logsearch/logsearch-boshrelease.