elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.23k stars 24.85k forks source link

GCS repository access fails with StorageException[401 Unauthorized #75368

Open bnprss opened 3 years ago

bnprss commented 3 years ago

Elasticsearch version (bin/elasticsearch --version): "version" : { "number" : "7.13.1", "build_flavor" : "default", "build_type" : "deb", "build_hash" : "9a7758028e4ea59bcab41c12004603c5a7dd84a9", "build_date" : "2021-05-28T17:40:59.346932922Z", "build_snapshot" : false, "lucene_version" : "8.8.2", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }

Plugins installed: [] discovery-gce 7.13.1 repository-gcs 7.13.1

JVM version (java -version): vendor version : openjdk 16 2021-03-16 OpenJDK Runtime Environment AdoptOpenJDK (build 16+36) OpenJDK 64-Bit Server VM AdoptOpenJDK (build 16+36, mixed mode, sharing)

OS version (uname -a if on a Unix-like system): Linux xxx-master-f566 5.8.0-1035-gcp #37~20.04.1-Ubuntu SMP Thu Jun 17 16:04:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior: On GCE, calls to GCS are un-authenticated after a short period of time (<7days) when using the service account that is attached to the resource. The error is : "reason": "StorageException[401 Unauthorized\nGET https://storage.googleapis.com/download/storage/v1/b/<indice path/name>?alt=media\nAnonymous caller does not have storage.objects.get access to the Google Cloud Storage object.]; nested: GoogleJsonResponseException[401 Unauthorized\nGET https://storage.googleapis.com/download/storage/v1/b/<indice path/name>?alt=media\nAnonymous caller does not have storage.objects.get access to the Google Cloud Storage object.]"

Steps to reproduce:

  1. setup instance with credential discovery
  2. add gcs repository plugin
  3. add gcs repository
    {
    "gcs": {
    "type": "gcs",
    "uuid": "xksHbk9BSOOscCvkwAa2vg",
    "settings": {
      "bucket": "<storage name>"
    }
    }
    }
  4. create SLM rule
  5. wait until it break, it is a matter of days
  6. trying to access to repository failed with Anonymous caller does not have storage.objects.get access error
  7. DELETE/PUT gcs repository object to solve the error

Provide logs (if relevant):

elasticmachine commented 3 years ago

Pinging @elastic/es-distributed (Team:Distributed)

elasticmachine commented 3 years ago

Pinging @elastic/es-core-features (Team:Core/Features)

gpapakyriakopoulos commented 3 years ago

+1 on this one, happens to us as well on version 7.14

aaroalan commented 2 years ago

I had this issue on 7.15.1 in my case only some of the indices failed, as you mentioned in a fresh deploy everything was working but after some days all snapshots had failed shards.

I "fixed" setting the env var GOOGLE_APPLICATION_CREDENTIALS with the path to the google service account this is the same file that is set in the Keystore (required for plugin), most (maybe all) of the google libraries default to that env var when is not provided, maybe in some point the Keystore value is not passed or lost.

It is being a week since the env var was added it and no failed shards so far (daily snapshots), will update if that happens again.