cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30k stars 3.79k forks source link

Kubernetes statefulset yaml doesn't set locality on node start #63509

Open lin-crl opened 3 years ago

lin-crl commented 3 years ago

Describe the problem

Please describe the issue you observed, and any steps we can take to reproduce it:

To Reproduce

What did you do? Describe in your own words.

If possible, provide steps to reproduce the behavior:

  1. Create a kubernetes cluster on EKS
  2. Apply statefulset yaml
  3. Look at DBConsole. The node localities are not set

Expected behavior The statefulset can correctly set localities, since it's the supported method to deploy production clusters

Environment:

Additional context Customer can see reduced resiliency when locality is not properly set. And can experience data loss in production when losing multiple nodes at the same time.

Add any other context about the problem here.

Jira issue: CRDB-6613

johnrk-zz commented 3 years ago

@jhatcher9999 , about a month ago, I recall that you mentioned having success using the Kubernetes Statefulset Yaml with multi-region deployments. Have you encountered this issue of node localities not setting?

jhatcher9999 commented 3 years ago

I haven't had this issue. I have had an issue with the EKS-specific sts files where it includes the dna name as the last part of the locality string which screws up the way things display in the DB Console (i.e., all the nodes show up in their own group in the node list).

Jessie, which sts yamls were you using when you had this issue? Can you include the link to the github file?

lin-crl commented 3 years ago

Jim, Here're the links to statefulset. https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/cockroachdb-statefulset-secure.yaml https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/bring-your-own-certs/cockroachdb-statefulset.yaml

You can see in cockroach start command locality is not set.

- exec
/cockroach/cockroach
start
--logtostderr
--certs-dir /cockroach/cockroach-certs
--advertise-host $(hostname -f)
--http-addr 0.0.0.0
--join
cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb
--cache $(expr $MEMORY_LIMIT_MIB / 4)MiB
--max-sql-memory $(expr $MEMORY_LIMIT_MIB / 4)MiB

I think the challenge might be to provide a solution that works across clouds/DC, not to mention the kubernetes version.

Jessie

On Tue, Apr 13, 2021 at 2:08 PM Jim Hatcher @.***> wrote:

I haven't had this issue. I have had an issue with the EKS-specific sts files where it includes the dna name as the last part of the locality string which screws up the way things display in the DB Console (i.e., all the nodes show up in their own group in the node list).

Jessie, which sts yamls were you using when you had this issue? Can you include the link to the github file?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/63509#issuecomment-819056207, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASE3X5LGSWRTQC5GQ3CHAJDTISXE7ANCNFSM422H7BQA .

jhatcher9999 commented 3 years ago

I think there is a sts for single region (the one you referenced) and a different one for multi-region: https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/multiregion/cockroachdb-statefulset-secure.yaml

Get Outlook for Androidhttps://aka.ms/ghei36


From: lin-crl @.> Sent: Tuesday, April 13, 2021 4:36:42 PM To: cockroachdb/cockroach @.> Cc: Jim Hatcher @.>; Mention @.> Subject: Re: [cockroachdb/cockroach] Kubernetes statefulset yaml doesn't set locality on node start (#63509)

Jim, Here're the links to statefulset. https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/cockroachdb-statefulset-secure.yaml https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/bring-your-own-certs/cockroachdb-statefulset.yaml

You can see in cockroach start command locality is not set.

Jessie

On Tue, Apr 13, 2021 at 2:08 PM Jim Hatcher @.***> wrote:

I haven't had this issue. I have had an issue with the EKS-specific sts files where it includes the dna name as the last part of the locality string which screws up the way things display in the DB Console (i.e., all the nodes show up in their own group in the node list).

Jessie, which sts yamls were you using when you had this issue? Can you include the link to the github file?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cockroachdb/cockroach/issues/63509#issuecomment-819056207, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASE3X5LGSWRTQC5GQ3CHAJDTISXE7ANCNFSM422H7BQA .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cockroachdb/cockroach/issues/63509#issuecomment-819070390, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCZBCXLXZW3HM6N27QVMA3TIS2OVANCNFSM422H7BQA.

lin-crl commented 3 years ago

The mutliregion statefulset doesn't have any values in it either

         - exec
            /cockroach/cockroach
            start
            --logtostderr
            --certs-dir /cockroach/cockroach-certs
            --advertise-host $(hostname -f)
            --http-addr 0.0.0.0
            --join JOINLIST
            --locality LOCALITYLIST
            --cache $(expr $MEMORY_LIMIT_MIB / 4)MiB
            --max-sql-memory $(expr $MEMORY_LIMIT_MIB / 4)MiB
jhatcher9999 commented 3 years ago

I guess that version (the GKE version) is meant to be used with this python script which fills in that placeholder variable: https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/multiregion/setup.py#L168

There is also this version which gets referenced in the AWS version of the docs: https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/multiregion/eks/cockroachdb-statefulset-secure-eks.yaml#L254

I think Ryan Kuo is the person who maintains these. He can probably clear up any questions better than me.

Get Outlook for Androidhttps://aka.ms/ghei36


From: lin-crl @.> Sent: Wednesday, April 14, 2021 6:23:58 PM To: cockroachdb/cockroach @.> Cc: Jim Hatcher @.>; Mention @.> Subject: Re: [cockroachdb/cockroach] Kubernetes statefulset yaml doesn't set locality on node start (#63509)

The mutliregion statefulset doesn't have any values in it either

     - exec
        /cockroach/cockroach
        start
        --logtostderr
        --certs-dir /cockroach/cockroach-certs
        --advertise-host $(hostname -f)
        --http-addr 0.0.0.0
        --join JOINLIST
        --locality LOCALITYLIST
        --cache $(expr $MEMORY_LIMIT_MIB / 4)MiB
        --max-sql-memory $(expr $MEMORY_LIMIT_MIB / 4)MiB

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/cockroachdb/cockroach/issues/63509#issuecomment-819912413, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCZBCRQVUW6SBGJKWV4RQTTIYPY5ANCNFSM422H7BQA.

lin-crl commented 3 years ago

Thanks for the update @jhatcher9999 @johnrk hope the this discuss gives a better description of the issue. could you please follow up w/ Eng/Doc team to address the issue? Thank you!

knz commented 1 year ago

cc @mwang1026 for product triage.

mwang1026 commented 1 year ago

cc @towfiqa

github-actions[bot] commented 6 months ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!