Closed Bah27 closed 1 week ago
Hi @Bah27
We updated the Thanos version to 0.36.0
on that release, see https://github.com/bitnami/charts/pull/28607
It seems that, on version 0.36.1
a fix for a regression on TLS config was included in Query:
Could you try with that version? It's already available in the Bitnami chart using latest version.
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
Hello @juan131
I apologize for the lack of follow-up; I was on vacation. I will test version 0.36.1, as recommended, to check if the issue related to the TLS configuration is resolved with the fix mentioned in this pull request.
In the meantime, I have observed several errors in the logs, including:
ts=2024-09-30T08:30:06.159166083Z caller=endpointset.go:471 level=warn component=endpointset msg="update of endpoint failed" err="getting metadata: rpc error: code = DeadlineExceeded desc = context deadline exceeded" address=@domain1:443
ts=2024-09-30T08:30:06.159645638Z caller=endpointset.go:471 level=warn component=endpointset msg="update of endpoint failed" err="getting metadata: rpc error: code = DeadlineExceeded desc = context deadline exceeded" address=@domain2:443
ts=2024-09-30T08:30:11.162302928Z caller=endpointset.go:471 level=warn component=endpointset msg="update of endpoint failed" err="getting metadata: rpc error: code = DeadlineExceeded desc = context deadline exceeded" address=@domain3:443
ts=2024-09-30T08:30:11.162285816Z caller=endpointset.go:471 level=warn component=endpointset msg="update of endpoint failed" err="getting metadata: rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: authentication handshake failed: tls: first record does not look like a TLS handshake\"" address=100.64.41.170:10901
These logs show errors related to timeouts and TLS authentication issues on the following endpoints:
Thank you for your patience!
Thanks @Bah27 ! Please let us know about your insights once you try it with the latest chart version.
Thank you @juan131! I proceeded with the tests using the latest chart version, but unfortunately, I am still encountering the same errors.
Hi @Bah27
Sorry for the delay in my response. I've been reviewing the values you shared paying special attention to the block below:
query:
(...)
grpc:
client:
tls:
enabled: true
existingSecret:
name: thanos-cert
keyMapping:
ca-cert: ca.crt
tls-cert: tls.crt
tls-key: tls.key
clientAuthEnabled: true
stores:
- "@domain1:443"
- "@domain2:443"
- "@domain2:443"
extraFlags:
- --grpc-client-tls-skip-verify
- --store.response-timeout=0
It seems you enabled TLS for GRPC in the client side but you didn't do the same for the server side (query.grpc.server.tls.enabled
is false
by default and you didn't modify it). Also, you're setting the property query.grpc.client.tls.clientAuthEnabled
which doesn't exist, I guess you meant query.grpc.server.tls.clientAuthEnabled
, right? See:
Also, regarding this block:
query:
dnsDiscovery:
enable: false
sidecarsService: prometheus-operated
sidecarsNamespace: monitoring
Please note query.dnsDiscovery.sidecarsService
and query.dnsDiscovery.sidecarsNamespace
will be ignored if query.dnsDiscovery.enabled
is false
, see:
Hi @juan131,
Thanks for your reply and for taking the time to carefully review the configuration details.
TLS for gRPC server You're absolutely right. I had enabled TLS on the client side but missed doing so on the server side. I'll correct this by adding query.grpc.server.tls.enabled: true. And yes, I mistakenly used clientAuthEnabled in the wrong place. What I meant to use was query.grpc.server.tls.clientAuthEnabled.
Thanks for pointing that out—it really helped me understand the mistake. I’ll adjust the configuration as you suggested.
dnsDiscovery Regarding DNS discovery, good catch! I didn't realize that query.dnsDiscovery.sidecarsService and sidecarsNamespace would be ignored if enable is set to false. I’ll either enable dnsDiscovery.enable or remove those parameters if they're not needed.
Thanks again for the clarifications and for linking the documentation—this was super helpful!
I’ll update everything and run some tests.
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.
Name and Version
thanos/15.7.16
What architecture are you using?
None
What steps will reproduce the bug?
Update charts thanos 15.7.15 to 15.7.16
Are you using any custom parameters or values?
NB: @domain1 is the domain name of each sidecar.
What do you see instead?