OT-CONTAINER-KIT / helm-charts

A repository which that will contain helm charts with best and security practices.
https://ot-container-kit.github.io/helm-charts
49 stars 84 forks source link

there is new volme mount node-conf-redis-cluster defined for 0.15.2 version and its not defined in values.yaml #114

Open csuryac opened 1 year ago

csuryac commented 1 year ago

there is new volme mount defined for 0.15.2 version node-conf-redis-clusterwith default 1Mi but that is not defined in values.yaml . is it possible to declare in values .yaml so that we can increase the size

shubham-cmyk commented 1 year ago

We can do that but I thought that the volume should only store the node.conf to make the cluster state running if the Main volume is not attached to it.

I think if you want to increase the size you must increase the other volume which stores the data changing this might not be a good idea.

What do you think about that?

one more thing there should be a default storage class in your k8s to make a 1Mi volume. You might feel this is a bit hardcoded but we are ready to accept change if i would get a good recommendation

csuryac commented 1 year ago

i agree but with some of the private clould providers default is 5gb getting that less space 1Mi might not be possible . any alternatives approaches for this ?

shubham-cmyk commented 1 year ago

This is actually a real problem I have seen. I might try node volume bind but don't want to stick to that node, probably this thing would be addressed in the next release.

ZleFox commented 1 year ago

I think it is a good practice and it is expected, when you use persistent storage, to be able to define the storage class

csuryac commented 1 year ago

@shubham-cmyk the latest version 0.15.3 still has issues

✦ ❯ helm install redis-cluster ot-helm/redis-cluster \ --set redisCluster.clusterSize=3 --namespace ot-operator Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(RedisCluster.spec.storage): unknown field "nodeConfVolumeClaimTemplate" in in.opstreelabs.redis.redis.v1beta1.RedisCluster.spec.storage

shubham-cmyk commented 1 year ago

This is because you have not updated the CRD. If the CRD of the previous version is there then the helm install won't upgrade the CRD. You have to delete the CRD manually then only it would install

revathyr13 commented 1 year ago

@shubham-cmyk or @iamabhishek-dubey

I am also facing the same issue. So may I know , What are the exact steps for upgrading the crd?

In case if we already have a crd in our cluster, does kubectl apply -f newcrdfile.yaml won't override the existing crd ? I mean the old crd?

You have to delete the CRD manually then only it would install, Before deleting the CRD we have to delete the Redisclusters installed in the clusters right ? In that case we may lose the data. I couldn't find any document which explains the operator or crd upgrade by keeping the existing clusters. Can someone please share ?

shubham-cmyk commented 1 year ago

Yes you have uninstall and install it to prevent the data loss you have to make a backup and restore it. This way you can prevent the data loss.

shubham-cmyk commented 1 year ago

@revathyr13 I would write a migration doc I think most of the user are facing it.

revathyr13 commented 1 year ago

@shubham-cmyk
Thanks a lot for the update. Any ETA for the migration doc? Hope the doc will consider Redis standalone as well as cluster migration steps.

shubham-cmyk commented 1 year ago

We do have some scripts that could make backup to the s3 and restore that. You could check out that I would write a basic doc this today or tomorrow sure that could show to use that script efficiently.

Check the scripts : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/scripts

There are some other option available for the migration like velero you could check that out also

revathyr13 commented 1 year ago

@shubham-cmyk

Thank you

revathyr13 commented 1 year ago

Hello @shubham-cmyk

I tried the backup scripts from my end.

As per my understanding, the backup script creates the rbd snapshots of each master node and uploads to AWS/GCP S3 buckets. In our case it was AWS. This part works fine for me.

However, the restore part didn't work.

As per the script https://github.com/OT-CONTAINER-KIT/redis-operator/blob/master/scripts/restore/restore.bash it restores the latest rbd snapshot of master pods to the respective master pod right? I tried in that manner, but I didn't get the data/keys in the source cluster from the destination redis cluster.

So please briefly explain the backup /restore process. Also do we need to take rbd snapshots of all the pods in the source cluster [master and slave] and restore them? I migrated the data from redis cluster running on 6 version in operator version 0.10 to the redis version 7 running in operator version 0.15. Not sure do we have any change restore/backup steps depends on redis operator version. Awaiting for your reply

shubham-cmyk commented 1 year ago

Yes you are right. Just to confirm did you restore them in the initcontainers because if you restore after starting the server you can't recover it. I have written a basic doc I would upload that now

revathyr13 commented 1 year ago

I have created a new cluster and restored the snapshots from aws directly to the Redis master pods. I think at that time the redis cluster was running. In some docs I noticed that we have to stop the redis service before restoring the dump file. As i couldn't find any method, I just restored without stopping the service.

Please share the backup doc so that I can retry it with the help of it. Thank you

shubham-cmyk commented 1 year ago

@revathyr13 Yes we have to use the initcontainer for that. You may find the docs here https://github.com/OT-CONTAINER-KIT/redis-operator/pull/588

revathyr13 commented 1 year ago

Can you please share the documentation so that we will get a better understanding.

Awaiting your response.

On Fri, Aug 25, 2023 at 8:17 PM Shubham Gupta @.***> wrote:

@revathyr13 https://github.com/revathyr13 Yes we have to use the initcontainer for that.

— Reply to this email directly, view it on GitHub https://github.com/OT-CONTAINER-KIT/helm-charts/issues/114#issuecomment-1693486184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM52UUGMBTVAWORLNFJYL2LXXC3GTANCNFSM6AAAAAAZ7R35BY . You are receiving this because you were mentioned.Message ID: @.***>

shubham-cmyk commented 1 year ago

It is not published on the website yet but you can review these links :

backup : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/scripts/backup restore : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/scripts/restore

There is backup.md and restore.md there plus we have manifest, Docker Image, env_vars.env also that would be used in this process.

revathyr13 commented 1 year ago

Hello @shubham-cmyk

I tried passing the restore docker image as init containers. The dump files were restored properly as dump.rdb. But still, restoration was not successful. Let me explain the restoration steps I tried

1) Initially I tried to restore after disabling appendonly aof [adding appendonly no in external config]. Dump.rbd files were successfully restored to the Data directory of each pod. However, the cluster join failed with the above errors.

10.236.70.209:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.\n"}

I tried flushdb as well, but that didn't help.

2) Second time I tried enabling the appendonly aof which is the default setting of the operator. This time as well, the dump.rbd files were successfully restored via init containers and appendoly directories were created. Cluster join also worked fine. However, no data/keys were able to be fetched from redis. Getting nil values for all the keys with values in the backup redis cluster. Get keys also throw nil values.

Not sure, if I am missing anything in the restore process. I am attaching the manifest I used.

cluster.txt

Please have a look and let me know your thoughts.

shubham-cmyk commented 1 year ago

@revathyr13 What redis image and operator image are you using. Also please join slack https://github.com/OT-CONTAINER-KIT/helm-charts#contact-information I would be more available there

revathyr13 commented 1 year ago

Hello @shubham-cmyk ,

Thanks for the update

Version details

Source cluster Operator version: 0.10.0 Redis version or image : opstree-redis:v6.2.5

Destination Cluster: Operator version : 0.15.0 Redis version : Tried both in opstree-redis:v6.2.5 and opstree-redis:v7.0.5

shubham-cmyk commented 1 year ago

You should use v7.0.11 for v0.15.0 @revathyr13

revathyr13 commented 1 year ago

@shubham-cmyk

Thanks for the update. Tried with version v7.0.11 as well. No luck. The restoration of dump.rdb files worked fine and Rediscluster was built up by the operator. But couldn't see any keys in cluster

bash-5.1$ ls -la total 2664212 drwxrwxrwx 4 root root 4096 Sep 6 05:08 . drwxr-xr-x 1 root root 57 Sep 6 05:08 .. drwxr-xr-x 2 redis redis 4096 Sep 6 05:08 appendonlydir -rw-r--r-- 1 root root 2728122568 Sep 6 05:07 dump.rdb drwx------ 2 root root 16384 Sep 6 05:06 lost+found bash-5.1$ 10.233.68.50:6379> get devportal:re:XX:XXXX (nil) 10.233.68.50:6379>

The above key have true value in source cluster

shubham-cmyk commented 1 year ago

Let me inspect this issue what might be the problem @revathyr13

shubham-cmyk commented 1 year ago

if the dump.rdb are properly getting placed it mean the scripts are working fine. Since we have moved the dump.rdb from the intiContainer we could make sure the redis-server is not started yet.

This might be some issue from the redis part I have to revisit the restore docs via dump.rbd

I am replaying the scenario right now will update.

shubham-cmyk commented 1 year ago

@revathyr13

I just replayed the scenerio the keys were loaded but the cluster was not properly served so all keys were not loaded I am working on this

image

Check there I have added few manifest that i used and fixed a bug so that no restore to the follower pods https://github.com/OT-CONTAINER-KIT/redis-operator/pull/609

revathyr13 commented 1 year ago

@shubham-cmyk Thanks for checking. Waiting for further updates.

revathyr13 commented 1 year ago

@shubham-cmyk

Any new updates.

shubham-cmyk commented 1 year ago

@revathyr13

I have updated the scripts for backup and restore. You may find the example here : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/example/v1beta2/backup_restore

The restore on the operator : v0.15.1 is failing for now. i am fixing that.

I have opened a issue : https://github.com/OT-CONTAINER-KIT/redis-operator/issues/625 Let's move the conversation there.