cockroachdb / cockroach-operator

k8s operator for CRDB
Apache License 2.0
284 stars 95 forks source link

failed to cause an issue by attempting to upgrade from version v.19.2.12 to v20.2.3 - unclear how to upgrade CRDB version #507

Open theodore-hyman opened 3 years ago

theodore-hyman commented 3 years ago

EXPECTED RESULT: when the cockroachDB version is specified to be an updated version from the current version of the DB, in the Openshift operator, it should upgrade or fail. There should be some kind of notification somewhere? Or some kind of indication of something happening.

ACTUAL RESULT: nothing happened and I don't know why. I changed the operator config and reloaded my containers and it seems to have had no impact.

In my write up this implies that there is something wrong with CRDB or with Openshift however it is completely possible that I had no idea what I was doing because there was no documentation on how to update the version of CRDB in this environment, updating the Operator config and reloading may have been the wrong step? Perhaps this needs someone with some expertise to write up some documentation on the specific steps to update the version..

detailed replication steps: [referring to this documentation https://www.cockroachlabs.com/docs/v21.1/deploy-cockroachdb-with-kubernetes-openshift.html]

➜ ~ oc get pods NAME READY STATUS RESTARTS AGE cockroach-operator-68977698f7-fcfxh 1/1 Running 0 119s crdb-client-secure 1/1 Running 0 26m crdb-tls-example-0 1/1 Running 0 30m crdb-tls-example-1 1/1 Running 0 29m crdb-tls-example-2 1/1 Running 0 57s crdb-tls-example-vcheck-27021277-v78jq 0/1 Completed 0 30m

crdb-tls-example-2 is the one that should be running v20.2.3.. but I don't think it is.

Hope this is helpful

keith-mcclellan commented 3 years ago

I agree that the docs could be better here - but "nothing" happened is what should have happened here - you cannot upgrade directly from 19.2 to 20.2 without upgrading through 20.1. This is a database limitation and the operator simply enforces it.

@taroface are we explicitly covering upgrades in your docs enhancements for the kubernetes operator? If not can we add a section on this topic?

keith-mcclellan commented 3 years ago

@theodore-hyman did you see the pod that started with the name vcheck in it? thats where you'd have found your error

taroface commented 3 years ago

@taroface are we explicitly covering upgrades in your docs enhancements for the kubernetes operator? If not can we add a section on this topic?

Yes, upgrades are part of these docs, though the instructions are basically intact from the existing upgrade steps here: https://www.cockroachlabs.com/docs/v21.1/orchestrate-cockroachdb-with-kubernetes.html#upgrade-the-cluster

These steps call out that

To upgrade to a new version, you must first be on a production release of the previous version. The release does not need to be the latest production release of the previous version, but it must be a production release rather than a testing release (alpha/beta).

I can try to add emphasis here.

EDIT: I just saw that these are steps @jseldess added as part of the v21.1 docs update! They weren't there previously, my apologies. (They were in the regular CRDB upgrade docs and not the K8s version.) These are very clarifying and I'll add them to the WIP K8s docs update.

theodore-hyman commented 3 years ago

@keith-mcclellan yes I agree this was overall expected behavior. I submitted this issue as part of the CRL Openshift bug bash and I was testing a "negative testing scenario" and in this scenario it was expected to not work. I did not check the 'vcheck' logs. If thats the expected place to see this type of error, good to know - maybe something to document? Not sure. The cluster has since been decommissioned as they are expensive so I don't have any method to further test this stuff on openshift.

@taroface The steps I was following are these:

https://www.cockroachlabs.com/docs/v21.1/deploy-cockroachdb-with-kubernetes-openshift.html

I understand that the upgrade steps are in the link below [which is new content!], but my thinking is that in your comment it is implicit that any given customer has to read through not only the Openshift docs, but has to read through the Kubernetes docs in order to properly manage their cluster. However, this is not obvious to me? If I was a customer I may be confused thinking that the kubernetes docs may not apply to my openshift deployment? Not sure. Might be something to consider making more clear in the docs.

For example, at the bottom of the Openshift doc linked above, it says "Note: For more information on managing secrets, see the Kubernetes documentation." - maybe this needs to have more than just secrets, maybe it should include upgrades, as well as other topics? Or maybe just say... "for more information on managing your cluster see Kubernetes docs..

https://www.cockroachlabs.com/docs/v21.1/orchestrate-cockroachdb-with-kubernetes.html#upgrade-the-cluster

Basically what I'm trying to say here is that this issue was opened because I had some feedback from the bug bash that I participated in, but I am not an actual customer, and my comments and feedback may not be representative of a real customer using this documentation, so feel free to take with grains of salt.

taroface commented 3 years ago

@theodore-hyman This totally makes sense, thank you for calling it out. I forgot that you were following the OpenShfit docs in this case. Will be making a few updates to (hopefully) clarify :)