Open lin-crl opened 3 years ago
@jhatcher9999 , about a month ago, I recall that you mentioned having success using the Kubernetes Statefulset Yaml with multi-region deployments. Have you encountered this issue of node localities not setting?
I haven't had this issue. I have had an issue with the EKS-specific sts files where it includes the dna name as the last part of the locality string which screws up the way things display in the DB Console (i.e., all the nodes show up in their own group in the node list).
Jessie, which sts yamls were you using when you had this issue? Can you include the link to the github file?
Jim, Here're the links to statefulset. https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/cockroachdb-statefulset-secure.yaml https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/bring-your-own-certs/cockroachdb-statefulset.yaml
You can see in cockroach start command locality is not set.
- exec
/cockroach/cockroach
start
--logtostderr
--certs-dir /cockroach/cockroach-certs
--advertise-host $(hostname -f)
--http-addr 0.0.0.0
--join
cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb
--cache $(expr $MEMORY_LIMIT_MIB / 4)MiB
--max-sql-memory $(expr $MEMORY_LIMIT_MIB / 4)MiB
I think the challenge might be to provide a solution that works across clouds/DC, not to mention the kubernetes version.
Jessie
On Tue, Apr 13, 2021 at 2:08 PM Jim Hatcher @.***> wrote:
I haven't had this issue. I have had an issue with the EKS-specific sts files where it includes the dna name as the last part of the locality string which screws up the way things display in the DB Console (i.e., all the nodes show up in their own group in the node list).
Jessie, which sts yamls were you using when you had this issue? Can you include the link to the github file?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/63509#issuecomment-819056207, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASE3X5LGSWRTQC5GQ3CHAJDTISXE7ANCNFSM422H7BQA .
I think there is a sts for single region (the one you referenced) and a different one for multi-region: https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/multiregion/cockroachdb-statefulset-secure.yaml
Get Outlook for Androidhttps://aka.ms/ghei36
From: lin-crl @.> Sent: Tuesday, April 13, 2021 4:36:42 PM To: cockroachdb/cockroach @.> Cc: Jim Hatcher @.>; Mention @.> Subject: Re: [cockroachdb/cockroach] Kubernetes statefulset yaml doesn't set locality on node start (#63509)
Jim, Here're the links to statefulset. https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/cockroachdb-statefulset-secure.yaml https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/bring-your-own-certs/cockroachdb-statefulset.yaml
You can see in cockroach start command locality is not set.
Jessie
On Tue, Apr 13, 2021 at 2:08 PM Jim Hatcher @.***> wrote:
I haven't had this issue. I have had an issue with the EKS-specific sts files where it includes the dna name as the last part of the locality string which screws up the way things display in the DB Console (i.e., all the nodes show up in their own group in the node list).
Jessie, which sts yamls were you using when you had this issue? Can you include the link to the github file?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/63509#issuecomment-819056207, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASE3X5LGSWRTQC5GQ3CHAJDTISXE7ANCNFSM422H7BQA .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cockroachdb/cockroach/issues/63509#issuecomment-819070390, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCZBCXLXZW3HM6N27QVMA3TIS2OVANCNFSM422H7BQA.
The mutliregion statefulset doesn't have any values in it either
- exec
/cockroach/cockroach
start
--logtostderr
--certs-dir /cockroach/cockroach-certs
--advertise-host $(hostname -f)
--http-addr 0.0.0.0
--join JOINLIST
--locality LOCALITYLIST
--cache $(expr $MEMORY_LIMIT_MIB / 4)MiB
--max-sql-memory $(expr $MEMORY_LIMIT_MIB / 4)MiB
I guess that version (the GKE version) is meant to be used with this python script which fills in that placeholder variable: https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/multiregion/setup.py#L168
There is also this version which gets referenced in the AWS version of the docs: https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/multiregion/eks/cockroachdb-statefulset-secure-eks.yaml#L254
I think Ryan Kuo is the person who maintains these. He can probably clear up any questions better than me.
Get Outlook for Androidhttps://aka.ms/ghei36
From: lin-crl @.> Sent: Wednesday, April 14, 2021 6:23:58 PM To: cockroachdb/cockroach @.> Cc: Jim Hatcher @.>; Mention @.> Subject: Re: [cockroachdb/cockroach] Kubernetes statefulset yaml doesn't set locality on node start (#63509)
The mutliregion statefulset doesn't have any values in it either
- exec
/cockroach/cockroach
start
--logtostderr
--certs-dir /cockroach/cockroach-certs
--advertise-host $(hostname -f)
--http-addr 0.0.0.0
--join JOINLIST
--locality LOCALITYLIST
--cache $(expr $MEMORY_LIMIT_MIB / 4)MiB
--max-sql-memory $(expr $MEMORY_LIMIT_MIB / 4)MiB
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cockroachdb/cockroach/issues/63509#issuecomment-819912413, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCZBCRQVUW6SBGJKWV4RQTTIYPY5ANCNFSM422H7BQA.
Thanks for the update @jhatcher9999 @johnrk hope the this discuss gives a better description of the issue. could you please follow up w/ Eng/Doc team to address the issue? Thank you!
cc @mwang1026 for product triage.
cc @towfiqa
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
Describe the problem
Please describe the issue you observed, and any steps we can take to reproduce it:
To Reproduce
What did you do? Describe in your own words.
If possible, provide steps to reproduce the behavior:
Expected behavior The statefulset can correctly set localities, since it's the supported method to deploy production clusters
Environment:
Additional context Customer can see reduced resiliency when locality is not properly set. And can experience data loss in production when losing multiple nodes at the same time.
Add any other context about the problem here.
Jira issue: CRDB-6613