Orange-OpenSource / casskop

This Kubernetes operator automates the Cassandra operations such as deploying a new rack aware cluster, adding/removing nodes, configuring the C* and JVM parameters, upgrading JVM and C* versions, and many more...
https://orange-opensource.github.io/casskop/
Apache License 2.0
183 stars 54 forks source link

Bump operator sdk v1.13.0 #376

Closed AKamyshnikova closed 2 years ago

AKamyshnikova commented 2 years ago
Q A
Bug fix? []
New feature? [x]
API breaks? []
Deprecations? []
Related tickets fixes #X, partially #Y, mentioned in #Z
License Apache 2.0

What's in this PR?

Use latest operator-sdk version.

Fixes #351

Additional context

v1.x.x of operator-sdk contains several breaking changes and require massive refactor of operator structure. https://sdk.operatorframework.io/docs/building-operators/golang/migration/

Checklist

To Do

Work is still in progress. Multi-casskop is not finally switched yet.

AKamyshnikova commented 2 years ago

@cscetbon Hi! Could you upload newer version of casskop-build with v1.13.0 tag to orangeopensource? So I can update circleci settings as well.

cscetbon commented 2 years ago

@cscetbon Hi! Could you upload newer version of casskop-build with v1.13.0 tag to orangeopensource? So I can update circleci settings as well.

@AKamyshnikova Done ✅

AKamyshnikova commented 2 years ago

@cscetbon Thanks! I faced issues with update-crds - it passes locally, but fails in CI... Why we need such workaround for updating CRDs? Seems that v1alpha1 folders has been renamed to v2 and it broke compatibility, so we need to leave v1alpha1 folders for backward compatibility. Was this considered?

cscetbon commented 2 years ago

That was for existing clusters. If you have to get rid of v1 do it

AKamyshnikova commented 2 years ago

@cscetbon I just a little bit confused, I don't get any diff after running "make generate" locally. circleci/Dockerfile uses older controller-gen version, may be this is the problem. Sorry to bother, could you rebuild and upload casskop-build with v1.13.0 tag one more time? I've updated controller-gen version in latest commit.

cscetbon commented 2 years ago

@AKamyshnikova I just did it again 😉

AKamyshnikova commented 2 years ago

@cscetbon This pull request is ready for review.

cscetbon commented 2 years ago

@AKamyshnikova it seems to be a big work. Not sure why kustomize is used but I started to review your PR, not done yet. But e2e tests are failing and you can fix them in the meantime ?

AKamyshnikova commented 2 years ago

@cscetbon Yeah, will fix this. Kustomize is part of new things with v1.x https://sdk.operatorframework.io/docs/building-operators/golang/migration/#what-is-new

cscetbon commented 2 years ago

@AKamyshnikova lmk when it's ready and I'll rerun the tests. You can try them locally using k3d like circle does

cscetbon commented 2 years ago

@AKamyshnikova trying to rerun tests on your branch I see that TestCassandraBackupAlreadyExists is failing. let me know when you think it's ready

AKamyshnikova commented 2 years ago

@cscetbon I've updated client-go to v.22.0 and controller-runtime to v0.9.0 - it seems to break unittests. I reverted to use v.0.19.3 and v0.6.5, I think update of these dependencies can be done in separate pull request as requires some additional work. Now tests are passing.

cscetbon commented 2 years ago

👏 @AKamyshnikova unit tests are fixed now. e2e tests are failing though because of an error with helm

go: unknown environment setting GO111MODULE=true
helm install ******* helm/cassandra-operator --set image.tag=trigger-integration
Error: INSTALLATION FAILED: failed to install CRD crds/*.yaml: no objects visited
make: *** [Makefile:279: kuttl-test-fix-arg] Error 1
AKamyshnikova commented 2 years ago

@cscetbon Seems I accidentally created file helm/cassandra-operator/crds/*.yaml :( I dropped it now.

cscetbon commented 2 years ago

Helm install doesn't fail anymore but e2e are failing. It seems it can't find the status key in the object for some reason

AKamyshnikova commented 2 years ago

@cscetbon One of changes is that ready is dropped https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.0.0/#removed-package-pkgready I missed that helm templates require update for this, fixed now.

cscetbon commented 2 years ago

e2e Tests ran again. Still missing status key in k8s object

AKamyshnikova commented 2 years ago

e2e Tests ran again. Still missing status key in k8s object

I run tests locally and don't get missing .status I'm not sure how to debug this for now.

AKamyshnikova commented 2 years ago

@cscetbon Hi! Could you allow running rest of the checks? I dropped +kubebuilder:subresource:status in latest patch set, may be this was a problem.

cscetbon commented 2 years ago

@cscetbon Hi! Could you allow running rest of the checks? I dropped +kubebuilder:subresource:status in latest patch set, may be this was a problem.

Better now. There is one remaining task failing but I'm not sure it's on you this time. I'm gonna review it. If in the meantime you think the failing test is on you don't hesitate to fix it 😉

AKamyshnikova commented 2 years ago

I checked and was some strange diff in two website/static/slides*.js files, hopefully fixed now.

AKamyshnikova commented 2 years ago

Thanks for the hard work. A few comments and changes asked. Also do you know what would be the impact for a person who's already running the current version to move to this one ?

While I was testing this pull request I've updated my test cluster (which use casskop v2.0.2 and has cassandra clusters) with this version and things work in the same way. So, from user perspective it would be just update to latest version, no additional steps will be needed. Thanks for review! I will try to resolve comments soon.

cscetbon commented 2 years ago

@AKamyshnikova please try to not force push anymore, it's hard for me to review your last changes. As soon as a review has started you shouldn't force push anymore, that way only the changes can be reviewed and not the whole thing every time.

cscetbon commented 2 years ago

Thanks for review! I will try to resolve comments soon.

Let me know when it's ready for review. Also I tried to rerun an old PR tests and the deploy works there. So there must be something wrong on your branch. Take a look at https://app.circleci.com/pipelines/github/Orange-OpenSource/casskop/1840/workflows/ef53e126-ec4b-453c-959d-2d3c20a7b52c/jobs/18369

AKamyshnikova commented 2 years ago

@cscetbon Sorry, I'm not used to github review process. Always has gerrit review for this :) Will keep that in mind.

AKamyshnikova commented 2 years ago

Yes, it is ready for review. Can you suggest how I can fix this branch?

fdehay commented 2 years ago

hello @AKamyshnikova , I fixed the circle-ci to github connection issue. It should be ok to run the tests now I think Thanks for your hard work!

AKamyshnikova commented 2 years ago

@fdehay Thank you!

cscetbon commented 2 years ago

Thanks @AKamyshnikova I'll take care of the few minor things I've seen. We'll try to test it more before making any new release and open issues if we find some

cscetbon commented 2 years ago

@AKamyshnikova any idea why in that PR you've renamed the CRD file ?

diff --git a/multi-casskop/deploy/crds/db.orange.com_multicasskops.yaml b/multi-casskop/config/crd/bases/multicluster_v1alpha1_cassandramulticluster_crd.yaml
similarity index 100%
rename from multi-casskop/deploy/crds/db.orange.com_multicasskops.yaml
rename to multi-casskop/config/crd/bases/multicluster_v1alpha1_cassandramulticluster_crd.yaml

You might want to create a PR to fix that PR. If yes, I moved the repo to https://github.com/cscetbon/casskop. You can be the 1st to create a PR there. I still need to create a PR here to redirect to my repo as I'm taking over from there.

AKamyshnikova commented 2 years ago

@cscetbon It is the name that is generated by controller-gen. File with such name will always be created, to save old name we can only drop the one that are generated. Something like this I did in PR with dump to 0.19 https://github.com/Orange-OpenSource/casskop/commit/9c605183474830a2d921fde0923080fe4ba4038b#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R185

cscetbon commented 2 years ago

But why does it have v1alpha1 in its name ? I'm also fixing an issue in the backup/restore that was introduced by the upgrade of the sdk operator.

AKamyshnikova commented 2 years ago

I guess because when I start working on PR it was v1alpha1, I send PR with fixes https://github.com/cscetbon/casskop/pull/29 Please point to issues you found and I will try to help.