elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.92k stars 24.73k forks source link

[s3 snapshots] Refuse to do a snapshot if s3 bucket is detected with object versioning enabled, but provide override parameter to allow it #33948

Closed geekpete closed 4 years ago

geekpete commented 6 years ago

Describe the feature:

Sometimes users will inadvertedly have s3 object versioning enabled on a bucket they're snapshotting into which can cause additional storage cost for little (no?) benefit in the context of Elasticsearch snapshots. It might be good to default to preventing snapshots to versioned buckets but allow an override if a user really wants to do so, which will catch the scenario early to fail fast before ending up with a larger problem to solve later on.

At the very least a warning should be thrown, but having it fail to snapshot with a refusal reason would be a better user experience to both prevent users from doing something they probably don't want to and also to alert the user in a really obvious way and help them identify versioned buckets before snapshots get stored in such a bucket.

One possibility I can think of the need for override is for whatever reason that you're sharing a bucket that needs versioning on that you also want to snapshot to in a different directory for example, you'd then use the override to allow snapshots to a versioned bucket, but hopefully that won't be a common configuration.

elasticmachine commented 6 years ago

Pinging @elastic/es-distributed

geekpete commented 4 years ago

Is this feature still relevant with the recent code changes to how snapshots work? Is there any risk to large storage capacity wastage or inadvertently creating a vast amount of unnecessary objects if a bucket is used for snapshots without realising versioning is enabled for it? Or are the objects that will see frequent versions created mostly smaller metadata files?

original-brownbear commented 4 years ago

Is there any risk to large storage capacity wastage or inadvertently creating a vast amount of unnecessary objects if a bucket is used for snapshots without realising versioning is enabled for it?

We don't really do any overwrites any more. The only thing that is still annoying about versioning is that deletes might not work out and you need to configure a reasonable delete retention policy maybe so that your repo does not grow forever.

Or are the objects that will see frequent versions created mostly smaller metadata files?

We only ever overwrite one blob, the /index.latest blob at the repository root and it's 8 bytes in size only.

-> I would close this issue for at least two reasons:

  1. Enabling versioning does not break the repository functionality. If a user enables it, they must have a reason for doing so (effectively always some weird compliance thing) -> why make their life harder by forcing them to set another setting :) (also if we did this it would be annoying for BwC)
  2. We can't reliably detect whether versioning is enabled or disabled anyway. We could detect it when creating the repository but then the user could still enable versioning on the bucket later and we probably don't want to do the check for whether or not versioning is activated on every snapshot (we could but then if someone activates versioning for whatever reason something like SLM will just quietly break in the background which might not be great).

Seems this isn't really worth it?