Open csuryac opened 1 year ago
We can do that but I thought that the volume should only store the node.conf
to make the cluster state running if the Main volume is not attached to it.
I think if you want to increase the size you must increase the other volume which stores the data changing this might not be a good idea.
What do you think about that?
one more thing there should be a default storage class in your k8s to make a 1Mi volume. You might feel this is a bit hardcoded but we are ready to accept change if i would get a good recommendation
i agree but with some of the private clould providers default is 5gb getting that less space 1Mi might not be possible . any alternatives approaches for this ?
This is actually a real problem I have seen. I might try node volume bind but don't want to stick to that node, probably this thing would be addressed in the next release.
I think it is a good practice and it is expected, when you use persistent storage, to be able to define the storage class
@shubham-cmyk the latest version 0.15.3 still has issues
✦ ❯ helm install redis-cluster ot-helm/redis-cluster \ --set redisCluster.clusterSize=3 --namespace ot-operator Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(RedisCluster.spec.storage): unknown field "nodeConfVolumeClaimTemplate" in in.opstreelabs.redis.redis.v1beta1.RedisCluster.spec.storage
This is because you have not updated the CRD. If the CRD of the previous version is there then the helm install won't upgrade the CRD. You have to delete the CRD manually then only it would install
@shubham-cmyk or @iamabhishek-dubey
I am also facing the same issue. So may I know , What are the exact steps for upgrading the crd?
In case if we already have a crd in our cluster, does kubectl apply -f newcrdfile.yaml won't override the existing crd ? I mean the old crd?
You have to delete the CRD manually then only it would install, Before deleting the CRD we have to delete the Redisclusters installed in the clusters right ? In that case we may lose the data. I couldn't find any document which explains the operator or crd upgrade by keeping the existing clusters. Can someone please share ?
Yes you have uninstall and install it to prevent the data loss you have to make a backup and restore it. This way you can prevent the data loss.
@revathyr13 I would write a migration doc I think most of the user are facing it.
@shubham-cmyk
Thanks a lot for the update. Any ETA for the migration doc? Hope the doc will consider Redis standalone as well as cluster migration steps.
We do have some scripts that could make backup to the s3 and restore that. You could check out that I would write a basic doc this today or tomorrow sure that could show to use that script efficiently.
Check the scripts : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/scripts
There are some other option available for the migration like velero you could check that out also
@shubham-cmyk
Thank you
Hello @shubham-cmyk
I tried the backup scripts from my end.
As per my understanding, the backup script creates the rbd snapshots of each master node and uploads to AWS/GCP S3 buckets. In our case it was AWS. This part works fine for me.
However, the restore part didn't work.
As per the script https://github.com/OT-CONTAINER-KIT/redis-operator/blob/master/scripts/restore/restore.bash it restores the latest rbd snapshot of master pods to the respective master pod right? I tried in that manner, but I didn't get the data/keys in the source cluster from the destination redis cluster.
So please briefly explain the backup /restore process. Also do we need to take rbd snapshots of all the pods in the source cluster [master and slave] and restore them? I migrated the data from redis cluster running on 6 version in operator version 0.10 to the redis version 7 running in operator version 0.15. Not sure do we have any change restore/backup steps depends on redis operator version. Awaiting for your reply
Yes you are right.
Just to confirm did you restore them in the initcontainers
because if you restore after starting the server you can't recover it.
I have written a basic doc I would upload that now
I have created a new cluster and restored the snapshots from aws directly to the Redis master pods. I think at that time the redis cluster was running. In some docs I noticed that we have to stop the redis service before restoring the dump file. As i couldn't find any method, I just restored without stopping the service.
Please share the backup doc so that I can retry it with the help of it. Thank you
@revathyr13 Yes we have to use the initcontainer
for that.
You may find the docs here https://github.com/OT-CONTAINER-KIT/redis-operator/pull/588
Can you please share the documentation so that we will get a better understanding.
Awaiting your response.
On Fri, Aug 25, 2023 at 8:17 PM Shubham Gupta @.***> wrote:
@revathyr13 https://github.com/revathyr13 Yes we have to use the initcontainer for that.
— Reply to this email directly, view it on GitHub https://github.com/OT-CONTAINER-KIT/helm-charts/issues/114#issuecomment-1693486184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM52UUGMBTVAWORLNFJYL2LXXC3GTANCNFSM6AAAAAAZ7R35BY . You are receiving this because you were mentioned.Message ID: @.***>
It is not published on the website yet but you can review these links :
backup : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/scripts/backup restore : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/scripts/restore
There is backup.md and restore.md there plus we have manifest, Docker Image, env_vars.env also that would be used in this process.
Hello @shubham-cmyk
I tried passing the restore docker image as init containers. The dump files were restored properly as dump.rdb. But still, restoration was not successful. Let me explain the restoration steps I tried
1) Initially I tried to restore after disabling appendonly aof [adding appendonly no in external config]. Dump.rbd files were successfully restored to the Data directory of each pod. However, the cluster join failed with the above errors.
10.236.70.209:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.\n"}
I tried flushdb as well, but that didn't help.
2) Second time I tried enabling the appendonly aof which is the default setting of the operator. This time as well, the dump.rbd files were successfully restored via init containers and appendoly directories were created. Cluster join also worked fine. However, no data/keys were able to be fetched from redis. Getting nil values for all the keys with values in the backup redis cluster. Get keys also throw nil values.
Not sure, if I am missing anything in the restore process. I am attaching the manifest I used.
Please have a look and let me know your thoughts.
@revathyr13 What redis image and operator image are you using. Also please join slack https://github.com/OT-CONTAINER-KIT/helm-charts#contact-information I would be more available there
Hello @shubham-cmyk ,
Thanks for the update
Version details
Source cluster Operator version: 0.10.0 Redis version or image : opstree-redis:v6.2.5
Destination Cluster: Operator version : 0.15.0 Redis version : Tried both in opstree-redis:v6.2.5 and opstree-redis:v7.0.5
You should use v7.0.11 for v0.15.0 @revathyr13
@shubham-cmyk
Thanks for the update. Tried with version v7.0.11 as well. No luck. The restoration of dump.rdb files worked fine and Rediscluster was built up by the operator. But couldn't see any keys in cluster
bash-5.1$ ls -la total 2664212 drwxrwxrwx 4 root root 4096 Sep 6 05:08 . drwxr-xr-x 1 root root 57 Sep 6 05:08 .. drwxr-xr-x 2 redis redis 4096 Sep 6 05:08 appendonlydir -rw-r--r-- 1 root root 2728122568 Sep 6 05:07 dump.rdb drwx------ 2 root root 16384 Sep 6 05:06 lost+found bash-5.1$ 10.233.68.50:6379> get devportal:re:XX:XXXX (nil) 10.233.68.50:6379>
The above key have true value in source cluster
Let me inspect this issue what might be the problem @revathyr13
if the dump.rdb are properly getting placed it mean the scripts are working fine. Since we have moved the dump.rdb from the intiContainer we could make sure the redis-server is not started yet.
This might be some issue from the redis part I have to revisit the restore docs via dump.rbd
I am replaying the scenario right now will update.
@revathyr13
I just replayed the scenerio the keys were loaded but the cluster was not properly served so all keys were not loaded I am working on this
Check there I have added few manifest that i used and fixed a bug so that no restore to the follower pods https://github.com/OT-CONTAINER-KIT/redis-operator/pull/609
@shubham-cmyk Thanks for checking. Waiting for further updates.
@shubham-cmyk
Any new updates.
@revathyr13
I have updated the scripts for backup and restore. You may find the example here : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/example/v1beta2/backup_restore
The restore on the operator : v0.15.1 is failing for now. i am fixing that.
I have opened a issue : https://github.com/OT-CONTAINER-KIT/redis-operator/issues/625 Let's move the conversation there.
there is new volme mount defined for 0.15.2 version node-conf-redis-clusterwith default 1Mi but that is not defined in values.yaml . is it possible to declare in values .yaml so that we can increase the size