As far as I can tell from reading the source, the requirement for multiple ES servers in distinct clusters stems from the way the importer works. It deletes the old index, creates a new one, populates it, and finally adjusts the setting in redis pointing to which server is active. I'm guessing this is done so that searches can still be performed while an import is happening. Perhaps this is also an attempt to segregate resource usage, so that indexing load does not impact searching and vice versa.
None of these requirements necessitates a separate ES cluster. Oculus is only using a single index, so it's no big deal to rotate between index names. Better still, you can use ES's index alias feature to point to the current index inside ES. For example, perhaps the importer could name the index it's creating metrics.YYYY.MM.DD.HH.MM.SS. When it finishes, it can atomically switch the "metrics" alias to point to the new index. The searcher then just refers to "metrics" as if it were the name of an actual index. This scheme also allows storing multiple historical indices, allowing one to search against historical fluctuations.
If isolation of indexing and searching is required, just use elasticsearch's routing parameters. One can ensure that searching and indexing always happen on different nodes. If complete isolation is required, then scrap my naming scheme above, alternate between metrics.0 and metrics.1, and add routing rules to ensure that the shards for those indices are stored on separate nodes.
Requiring multiple ES servers adds a significant barrier to entry. I need to either run multiple ES instances on one host (probably requiring me to retool my puppet manifest) or spin up multiple hosts. This wastes resources, since searches are probably going to be fairly rare so at least one ES cluster will be mostly idle.
Ultimately, given ES's flexibility in sharding and routing, I can't think of any case when an application would truly need to use multiple clusters.
As far as I can tell from reading the source, the requirement for multiple ES servers in distinct clusters stems from the way the importer works. It deletes the old index, creates a new one, populates it, and finally adjusts the setting in redis pointing to which server is active. I'm guessing this is done so that searches can still be performed while an import is happening. Perhaps this is also an attempt to segregate resource usage, so that indexing load does not impact searching and vice versa.
None of these requirements necessitates a separate ES cluster. Oculus is only using a single index, so it's no big deal to rotate between index names. Better still, you can use ES's index alias feature to point to the current index inside ES. For example, perhaps the importer could name the index it's creating
metrics.YYYY.MM.DD.HH.MM.SS
. When it finishes, it can atomically switch the "metrics" alias to point to the new index. The searcher then just refers to "metrics" as if it were the name of an actual index. This scheme also allows storing multiple historical indices, allowing one to search against historical fluctuations.If isolation of indexing and searching is required, just use elasticsearch's routing parameters. One can ensure that searching and indexing always happen on different nodes. If complete isolation is required, then scrap my naming scheme above, alternate between
metrics.0
andmetrics.1
, and add routing rules to ensure that the shards for those indices are stored on separate nodes.Requiring multiple ES servers adds a significant barrier to entry. I need to either run multiple ES instances on one host (probably requiring me to retool my puppet manifest) or spin up multiple hosts. This wastes resources, since searches are probably going to be fairly rare so at least one ES cluster will be mostly idle.
Ultimately, given ES's flexibility in sharding and routing, I can't think of any case when an application would truly need to use multiple clusters.