Open DelaunayAntoine opened 4 months ago
I'm seeing this as well. Did you figure it out?
I see this in the log, it looks like its trying the wrong url to the pods:
W240826 16:03:30.251454 142 server/init.go:407 ⋮ [T1,Vsystem,n?] 37 outgoing join rpc to ‹keycloak-cockroachdb-1.keycloak-cockroachdb.keycloak.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp: lookup keycloak-cockroachdb-1.keycloak-cockroachdb.keycloak.svc.cluster.local: no such host"›
I240826 16:03:30.258373 142 server/init.go:405 ⋮ [T1,Vsystem,n?] 38 ‹keycloak-cockroachdb-2.keycloak-cockroachdb.keycloak.svc.cluster.local:26257› is itself waiting for init, will retr
In my case its adding in 'keycloak-cockroachdb.' and in your case its adding in 'cockroachdb.', which it looks like it shouldn't be.
I've got the same problem with the latest 14.0.3 chart. I suppose that the reason is in helm hooks annotations of init job template. https://github.com/cockroachdb/helm-charts/blob/master/cockroachdb/templates/job.init.yaml#L22 Post-install hook can't be triggered because the stateful set is not ready. As a workaround you may deploy the init job manifest from the template manually.
Are either of you able to share your values file? A redacted version is likely fine, just to see what overrides you have set. I have. not ben able to reproduce this with the default values.
@udnay , here ya go:
$ cat Chart.yaml
apiVersion: v2
name: jellyfin
description: A Helm chart for Kubernetes
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
appVersion: "1.0"
dependencies:
- name: jellyfin
version: 2.1.0
repository: https://jellyfin.github.io/jellyfin-helm
- name: nats
version: 1.1.10
repository: https://nats-io.github.io/k8s/helm/charts/
- name: cockroachdb
version: 14.0.5
repository: https://charts.cockroachdb.com
$ cat values.yaml
nats:
natsBox:
enabled: false
Result:
$ k --context prod-admin@prod -n jellyfin get ing,pvc,all
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/datadir-jellyfin-cockroachdb-0 Bound pvc-1098f4ee-2a61-4dc0-944e-d4af39b1e95a 100Gi RWO cephfs <unset> 11m
persistentvolumeclaim/datadir-jellyfin-cockroachdb-1 Bound pvc-6daf0dfa-1f12-432d-9c22-8636433d1c82 100Gi RWO cephfs <unset> 11m
persistentvolumeclaim/datadir-jellyfin-cockroachdb-2 Bound pvc-ff28ab8d-ea1a-46cd-9d07-ef8d6367899f 100Gi RWO cephfs <unset> 11m
persistentvolumeclaim/jellyfin-config Bound pvc-de299b6e-a5b4-4926-a8da-e70f93c9fcfa 5Gi RWO cephfs <unset> 11m
persistentvolumeclaim/jellyfin-media Bound pvc-f921cdea-c8a9-4c0c-a98b-fb46368fa90b 25Gi RWO cephfs <unset> 11m
NAME READY STATUS RESTARTS AGE
pod/jellyfin-6898c4c4bf-m2jl6 1/1 Running 0 11m
pod/jellyfin-cockroachdb-0 0/1 Running 1 (5m3s ago) 11m
pod/jellyfin-cockroachdb-1 0/1 Running 1 (4m27s ago) 11m
pod/jellyfin-cockroachdb-2 0/1 Running 1 (4m25s ago) 11m
pod/jellyfin-nats-0 2/2 Running 0 11m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jellyfin ClusterIP 10.105.39.215 <none> 8096/TCP 11m
service/jellyfin-cockroachdb ClusterIP None <none> 26257/TCP,8080/TCP 11m
service/jellyfin-cockroachdb-public ClusterIP 10.110.42.45 <none> 26257/TCP,8080/TCP 11m
service/jellyfin-nats ClusterIP 10.97.28.92 <none> 4222/TCP 11m
service/jellyfin-nats-headless ClusterIP None <none> 4222/TCP,8222/TCP 11m
Logs of each cockroachdb pod show:
$ k --context prod-admin@prod -n jellyfin logs -f jellyfin-cockroachdb-0
Defaulted container "db" out of: db, copy-certs (init)
++ hostname
+ exec /cockroach/cockroach start --join=jellyfin-cockroachdb-0.jellyfin-cockroachdb.jellyfin.svc.cluster.local:26257,jellyfin-cockroachdb-1.jellyfin-cockroachdb.jellyfin.svc.cluster.local:26257,jellyfin-cockroachdb-2.jellyfin-cockroachdb.jellyfin.svc.cluster.local:26257 --advertise-host=jellyfin-cockroachdb-0.jellyfin-cockroachdb.jellyfin.svc.cluster.local --certs-dir=/cockroach/cockroach-certs/ --http-port=8080
*
* WARNING: Running a server without --sql-addr, with a combined RPC/SQL listener, is deprecated.
* This feature will be removed in a later version of CockroachDB.
*
*
* INFO: initial startup completed.
* Node will now attempt to join a running cluster, or wait for `cockroach init`.
* Client connections will be accepted after this completes successfully.
* Check the log file(s) for progress.
*
*
* WARNING: The server appears to be unable to contact the other nodes in the cluster. Please try:
*
* - starting the other nodes, if you haven't already;
* - double-checking that the '--join' and '--listen'/'--advertise' flags are set up correctly;
* - running the 'cockroach init' command if you are trying to initialize a new cluster.
*
* If problems persist, please see https://www.cockroachlabs.com/docs/v24.2/cluster-setup-troubleshooting.html.
*
Hello everyone,
I would like to deploy cockroachDB using helm but the problem is that the cluster can't start and I get this error that keeps appearing: Error
I240718 13:09:06.023104 191 server/init.go:405 ⋮ [T1,Vsystem,n?] 37 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry
Can you help me by giving me some hints on how to fix the problem?
Here's the entire log file and the values.yaml file db.txt values-cockroach.txt
I'm using cockroach version 24.1.1 The chart 13.0.1
What do you expect to see ?
The cockroach cluster launching just fine
What happened
Error
I240718 13:09:06.023104 191 server/init.go:405 ⋮ [T1,Vsystem,n?] 37 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry