alexandrevilain / temporal-operator

Temporal Kubernetes Operator
https://temporal-operator.pages.dev/
Apache License 2.0
152 stars 34 forks source link

Linkerd RST_STREAM protocol issues #532

Open MichaelCombs28 opened 11 months ago

MichaelCombs28 commented 11 months ago

I've launched a temporal cluster with the linkerd option and have been running into grpc errors. I don't have issues with regards to non-linkerd deployments.

TEMPORAL_ADDRESS is not set, setting it to 10.8.21.62:7233
2023/10/17 22:29:19 Loading config; env=docker,zone=,configDir=config
2023/10/17 22:29:19 Loading config files=[config/docker.yaml]
"level":"info","ts":"2023-10-17T22:29:19.304Z","msg":"Build info.","git-time":"2023-08-14T17:17:11.000Z","git-revision":"c79c00ac8f96c94dffae6e59eba9a279f7ebc656","git-modified":true,"go-arch":"amd64","go-os":"linux","go-version":"go1.20.6","cgo-enabled":false,"server-version":"1.21.5","debug-mode":false,"logging-call-at":"main.go:148"}
{"level":"info","ts":"2023-10-17T22:29:19.304Z","msg":"Dynamic config client is not configured. Using noop client.","logging-call-at":"main.go:168"}
{"level":"warn","ts":"2023-10-17T22:29:19.304Z","msg":"Not using any authorizer and flag `--allow-no-auth` not detected. Future versions will require using the flag `--allow-no-auth` if you do not want to set an authorizer.","logging-call-at":"main.go:178"}
{"level":"info","ts":"2023-10-17T22:29:19.330Z","msg":"Use rpc address 127.0.0.1:7233 for cluster prod.","component":"metadata-initializer","logging-call-at":"fx.go:840"}
{"level":"info","ts":"2023-10-17T22:29:19.330Z","msg":"Service is not requested, skipping initialization.","service":"history","logging-call-at":"fx.go:371"}
{"level":"info","ts":"2023-10-17T22:29:19.330Z","msg":"Service is not requested, skipping initialization.","service":"matching","logging-call-at":"fx.go:421"}
{"level":"info","ts":"2023-10-17T22:29:19.330Z","msg":"Service is not requested, skipping initialization.","service":"frontend","logging-call-at":"fx.go:479"}
{"level":"info","ts":"2023-10-17T22:29:19.330Z","msg":"Service is not requested, skipping initialization.","service":"internal-frontend","logging-call-at":"fx.go:479"}
{"level":"info","ts":"2023-10-17T22:29:19.347Z","msg":"historyClient: ownership caching disabled","service":"worker","logging-call-at":"client.go:82"}
{"level":"info","ts":"2023-10-17T22:29:19.348Z","msg":"PProf not started due to port not set","logging-call-at":"pprof.go:67"}
{"level":"info","ts":"2023-10-17T22:29:19.348Z","msg":"Starting server for services","value":{"worker":{}},"logging-call-at":"server_impl.go:88"}
{"level":"info","ts":"2023-10-17T22:29:19.363Z","msg":"RuntimeMetricsReporter started","service":"worker","logging-call-at":"runtime.go:138"}
{"level":"info","ts":"2023-10-17T22:29:19.363Z","msg":"worker starting","service":"worker","component":"worker","logging-call-at":"service.go:391"}
{"level":"info","ts":"2023-10-17T22:29:19.369Z","msg":"Membership heartbeat upserted successfully","address":"10.8.21.62","port":6939,"hostId":"9cef8afb-6d3c-11ee-93a5-560035998413","logging-call-at":"monitor.go:256"}
{"level":"info","ts":"2023-10-17T22:29:19.371Z","msg":"bootstrap hosts fetched","bootstrap-hostports":"10.8.11.88:6934,10.8.6.148:6933,10.8.21.62:6939,10.8.20.37:6935","logging-call-at":"monitor.go:298"}
{"level":"info","ts":"2023-10-17T22:29:19.377Z","msg":"Current reachable members","component":"service-resolver","service":"matching","addresses":["10.8.20.37:7235"],"logging-call-at":"service_resolver.go:279"}
{"level":"info","ts":"2023-10-17T22:29:19.377Z","msg":"Current reachable members","component":"service-resolver","service":"worker","addresses":["10.8.21.62:7239"],"logging-call-at":"service_resolver.go:279"}
{"level":"info","ts":"2023-10-17T22:29:19.377Z","msg":"Current reachable members","component":"service-resolver","service":"frontend","addresses":["10.8.6.148:7233"],"logging-call-at":"service_resolver.go:279"}
{"level":"info","ts":"2023-10-17T22:29:19.377Z","msg":"Current reachable members","component":"service-resolver","service":"history","addresses":["10.8.11.88:7234"],"logging-call-at":"service_resolver.go:279"}
{"level":"warn","ts":"2023-10-17T22:29:19.380Z","msg":"error creating sdk client","service":"worker","error":"failed reaching server: stream terminated by RST_STREAM with error code: PROTOCOL_ERROR","logging-call-at":"factory.go:114"}
{"level":"fatal","ts":"2023-10-17T22:29:19.380Z","msg":"error creating sdk client","service":"worker","error":"failed reaching server: stream terminated by RST_STREAM with error code: PROTOCOL_ERROR","logging-call-at":"factory.go:121","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Fatal\n\t/home/builder/temporal/common/log/zap_logger.go:180\ngo.temporal.io/server/common/sdk.(*clientFactory).GetSystemClient.func1\n\t/home/builder/temporal/common/sdk/factory.go:121\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:74\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:65\ngo.temporal.io/server/common/sdk.(*clientFactory).GetSystemClient\n\t/home/builder/temporal/common/sdk/factory.go:108\ngo.temporal.io/server/service/worker/scanner.(*Scanner).Start\n\t/home/builder/temporal/service/worker/scanner/scanner.go:229\ngo.temporal.io/server/service/worker.(*Service).startScanner\n\t/home/builder/temporal/service/worker/service.go:523\ngo.temporal.io/server/service/worker.(*Service).Start\n\t/home/builder/temporal/service/worker/service.go:408\ngo.temporal.io/server/service/worker.ServiceLifetimeHooks.func1.1\n\t/home/builder/temporal/service/worker/fx.go:136"}

This seems like a GRPC issue.

My temporal cluster:

apiVersion: v1
items:
  - apiVersion: temporal.io/v1beta1
    kind: TemporalCluster
    metadata:
      name: prod
      namespace: temporal-system
    spec:
      version: 1.21.5
      admintools:
        enabled: true
      jobTtlSecondsAfterFinished: 300
      log:
        development: false
        format: json
        level: info
        outputFile: ""
        stdout: true
      mTLS:
        provider: linkerd
        refreshInterval: 5m0s
      numHistoryShards: 1
      persistence:
        defaultStore:
          name: default
          passwordSecretRef:
            key: PASSWORD
            name: postgres-password
          skipCreate: false
          sql:
            connectAddr: <rds_instance>.<rds_region>.rds.amazonaws.com:5432
            connectProtocol: tcp
            databaseName: temporal
            maxConnLifetime: 0s
            maxConns: 0
            maxIdleConns: 0
            pluginName: postgres
            taskScanPartitions: 0
            user: <rds_user>
        visibilityStore:
          elasticsearch:
            closeIdleConnectionsInterval: 0s
            enableHealthcheck: false
            enableSniff: false
            indices:
              secondaryVisibility: ""
              visibility: temporal_visibility
            logLevel: ""
            url: https://<opensearch url>
            username: admin
            version: v7
          name: visibility
          passwordSecretRef:
            key: PASSWORD
            name: opensearch-password
          skipCreate: false
      ui:
        enabled: true
    status:
      conditions:
        - lastTransitionTime: "2023-10-17T22:18:24Z"
          message: ""
          observedGeneration: 1
          reason: ServicesNotReady
          status: "False"
          type: Ready
        - lastTransitionTime: "2023-10-17T22:18:24Z"
          message: ""
          observedGeneration: 1
          reason: LastReconcileCycleSucceded
          status: "True"
          type: ReconcileSuccess
      persistence:
        defaultStore:
          created: true
          schemaVersion: 1.21.5
          setup: true
          type: postgres
        visibilityStore:
          created: true
          schemaVersion: 1.21.5
          setup: true
          type: elasticsearch
      services:
        - name: frontend
          ready: true
          version: 1.21.5
        - name: history
          ready: true
          version: 1.21.5
        - name: matching
          ready: true
          version: 1.21.5
        - name: worker
          ready: false
          version: 1.21.5
      version: 1.21.5
kind: List
metadata:
  resourceVersion: ""
alexandrevilain commented 10 months ago

Hi!

I don't think this is related to the operator. The operator does nothing than asking linkerd to inject its sidecar.

Which linkerd version are you using ?