Closed bboychev closed 1 year ago
Error: p[eRWCTOWd]: duplicate IPs: p[eRWCTOWd] and p[KHWaDVhn] share the same "aistore-proxy-0.aistore-proxy.ais.svc.cluster.local:51081"
Each node in a cluster must have its own IP:port
. Deployment error. Closing.
Dear AIStore Team,
Thank you very much for your efforts for developing AIStore product. It looks great! I am interested in it from some time and am investigating its capabilities. I have deployed successfully two separate Large-scale production deployment (K8s) AIStore clusters 3.18.7081e29 on Ubuntu 22.04 virtual machines with K8s v1.25.2 following the documentation and ais-k8s. I have used
aistorage/ais-operator:0.94
for the purpose as I think read somewhere there to use that method. The other docker images areaistorage/ais-init:latest
andaistorage/aisnode:3.18
. I am using latest flannel as K8s network plugin if that matters. The deployment went fine with the exception that I needed to patch theaistore-proxy
statefulset livenessProbe and readinessProbe like that:kubectl -n ais patch statefulset/aistore-proxy -p '{"spec": {"template": {"spec": {"containers": [{"name": "ais-node","livenessProbe": {"timeoutSeconds": 10},"readinessProbe": {"timeoutSeconds": 10}}]}}}}'
. I have just deployed one AIS proxy and one AIS target per K8s cluster. I have deployed defaultingress-nginx
ingress controller with that ingress:So, I am trying to make a Production ready deployment. I am investigating AIStore scaling and clustering capabilities following available documentation:
Please correct me if I am wrong but I do see these options to scale up the deployment: 1) Increase storage size - increase configured disks by size and extend the filesystems. Any other options to achieve that with adding additional disk/s (e.g. maybe using Linux LVM)? 2) Remote attach using
ais cluster remote-attach ...
3) Join proxy/target node usingais cluster add-remove-nodes ...
4) Join new K8s worker (with additional disks) into existing K8s clustersThe two virtual machines
aisbox-3
andaisbox-4
are in same network segment 192.168.121.0/24 with IP address192.168.121.185
foraisbox-3
VM and192.168.121.154
foraisbox-4
VM. The logical networks I suppose are:10.244.0.57
/10.244.0.49
foraisbox-3
proxy/target pods and10.244.0.53/10.244.0.50
foraisbox-4
proxy/target pods;My tests showed that and so my follow up questions in regards to scalability: 1) Increase storage size - works fine - I am using LVM, so that's pretty clear. It will be probably fine with using standard disk devices as well. I suppose I need to increase PV and PVC capacity size as well. Any other options to increase the storage size for new objects using AIStore K8s deployment (I do not need new replica disks)? 2) Remote attach using
ais cluster remote-attach ...
command works in general but I was not able to get/put object on a remote bucket, so it does not work from functionality point of view, so it failed, see attached markdown file. Do we need some special K8s overlay network configuration? ais_prod_k8s_playground_remote-attach_tests.md 3) Join proxy/target node usingais cluster add-remove-nodes ...
failed for me. I have tried to attach a proxy node onaisbox-3
VM using the available one underaisbox-4
(through ingress controller) - failed, see attached markdown file. Do we need some special K8s overlay network configuration or special AIStore intra-cluster control/intra-cluster data configuration? ais_prod_k8s_playground_join_tests.md 4) Actually I haven't tested joining a new K8s worker (with additional disks) as I am not sure what I need to bring on it, an AIS proxy or target or both. I do not want new replicas of existing objects on that worker but possibility to have new objects. So, I am little bit confused here how to implement it.So, could you please help me understand where are my problems in that Prod/K8s setup? (the results raised major concerns in AIStore Prod/K8s Clustering Implementation and Scalability capabilities for me)
Best Regards, bboychev