Closed faradawn closed 2 years ago
For the convenience of the Seagate development team, this issue has been mirrored in a private Seagate Jira Server: https://jts.seagate.com/browse/CORTX-32156. Note that community members will not be able to access that Jira server but that is not a problem since all activity in that Jira mirror will be copied into this GitHub issue.
Thanks for opening the bug here and thanks for continuing to push the boundaries of CORTX! The scenario you are trying to work through is not yet fully supported from an available CORTX release. This is due to how the underlying storage is mapped from k8s Node through PV to PVC to Pod. The good news is that there is a feature actively being worked on that will help with some of this and allow it to be manually set up to allow for this exact use case.
I will create a new Issue here to track the updated documentation needs that will follow up on the use case here, along with the features delivered in branch /CORTX-29859_migrate_data_pods_statefulset, that will allow for this specific deployment.
Issue https://github.com/Seagate/cortx-k8s/issues/284 has been created to track the necessary documentation once CORTX-29859 is delivered in a release.
Until that time, you will need to have the same number of Data Pods as Worker Nodes in your k8s cluster.
Hi Rick,
Thanks for informing me that currently, the number of data pods should be the same as the number of nodes!
May I ask a question questions:
Thanks in advance!
Sorry for not seeing this follow-up question, @faradawn . For now you can see the current use case implementation to support your original scenario via https://github.com/Seagate/cortx-k8s/tree/CORTX-32209_manual_pv_usecase#advanced-deployment-scenarios (which will be delivered via https://github.com/Seagate/cortx-k8s/issues/285 sometime soon).
May I ask a question questions:
On the 8-node cluster, I seemed to accomplish a 1, 2, 4, and 8 data pods deployment. Wondered is this possible?
Take a look at the updates made in v0.9.0. There is a new container_group_size
parameter which allows you to control how many CVGs are managed per Pod, which will explicitly drive how many Data Pods can show up per Worker Node out of the box.
On a 8-node cluster, if I specify 8 data pods (each 1 Gi) in solution.yaml, does that mean on each node 8 data pods will be created -- so that there will be 64 data pods scheduled, achieving a storage capacity of 64 Gi?
CORTX will create StatefulSet controllers with the number of replicas equal to the length of the nodes list in solution.yaml. The amount of managed space on each of those subsequent Pods is determined by the structure of the cvgs list in solution.yaml.
So from your solution.yaml above, you have 8 nodes that each are expected to have 14 available block devices (2 for metadata and 12 for data), so you would have the simple multiplication of 81464Gi for raw capacity of what you are deploying.
Dear Rick,
Thanks for responding to the two questions! Got that
I think the issue is resolved! Appreciate your constant patience and help!
Best, Faradawn
No problem at all. Keep an eye out for the resolution of #285 in the next day or two and you'll have some additional scenarios to play with soon!
Problem
Tried to deploy CORTX with 12 data pods on a 8-node Kubernetes cluster, but encounter a HA-deployment timeout error. However, think that deployed successfully two days ago. But now, tried two times, all encountered the HA timeout error. Wondered may I ask for some help?
Expected behavior
Should be able to deploy 12 data pods, as 15 disks are available besides the 1 for system and 1 for fs-local-volumn. In addition, think that I deployed it successfully once.
How to reproduce
Can follow this deployment script: https://github.com/faradawn/tutorials/blob/main/linux/cortx/kube.sh
CORTX on Kubernetes version
v0.6.0
Deployment information
Kubernetes version: v1.24.0 kubectl version: v1.24.0 Container runtime: CRI-O
Solution configuration file YAML
Logs
First, the HA pods seemed to be running fine. Here is the result of get pod all namespace:
Second, here is all the deployments. The HA deployment also seemed alright.
Finally, here is the error during deployment:
Here is the disk layout:
Additional information
Thanks in advance!