Closed Clara12062 closed 3 months ago
looks like code needs to update so that the cache can be right
do you have the use case so that i can reproduce locally?
I checked the logic, we already cover the bounded pv/pvc
err, lvmPVCs, mpPVCs, devicePVCs := algorithm.GetPodUnboundPvcs(pvc, ctx)
if err != nil {
log.Errorf("failed to get pod unbound pvcs: %s", err.Error())
return nil, err
}
if len(lvmPVCs)+len(mpPVCs)+len(devicePVCs) == 0 {
msg := "unexpected schedulering request for all pvcs are bounded"
log.Info(msg)
return nil, fmt.Errorf(msg)
}
yep. But there seems to be a timing problem. Not necessarily in my environment. The scheduler.log information is as follows.
The corresponding lv appears to have been created and the nodecache updated. But pvc is still not in the bound state. The predict log contains the following information:
{"phase":"Pending","conditions":[{"type":"PodScheduled","status":"False","lastProbeTime":null,"lastTransitionTime":"2024-07-16T01:20:03Z","reason":"SchedulerError","message":"running PreBind plugin \"VolumeBinding\": Operation cannot be fulfilled on persistentvolumeclaims \"apgpt-log\": the object has been modified; please apply your changes to the latest version and try again"}],"qosClass":"Burstable"}},"Nodes":null,"NodeNames":["xos-862dd221"]}
So at this time, capacityPredict
does not skip the same pvc when it continues scheduling.
what's your case, do you have reproduce steps?
I am reproducing via following spec
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: test
image: /nginx:latest
volumeMounts:
# 网站数据挂载
- name: config
mountPath: /usr/share/nginx/html
subPath: html
volumes:
- name: config
persistentVolumeClaim:
claimName: html-nginx-lvm-0
but did not found any issue
ok wait a minute
kubectl apply -k k8s-resources/
But strangely, it hasn't come back today.
And the code implementation seems to be fine. Unless the pvc is not in the Bound state and the volume has been created. This problem will appear, but I did not find why pvc is not Bound phase.
Don't close this issue until I find the reproduction condition and then add more information.
[Uploading test.tar.gz…]()
but did not found any issue
So far, this phenomenon has not been repeated. Follow the logs for clues. Currently see nodecache capacity is updated in Assume.
func (c *ClusterNodeCache) Assume(units []AllocatedUnit) (err error) {
// all pass, write cache now
//TODO(yuzhi.wx) we need to move it out, after all check pass
for _, u := range units {
nodeCache := c.GetNodeCache(u.NodeName)
if nodeCache == nil {
return fmt.Errorf("node %s not found from cache when assume", u.NodeName)
}
volumeType := u.VolumeType
switch volumeType {
case pkg.VolumeTypeLVM:
_, err = c.assumeLVMAllocatedUnit(u, nodeCache)
At this point, it has been shown that pvc scheduling is successful.
But at onPodUpdate
, if a pvc of a pod is pending, whether it affects other PVCS, resulting in the scheduling, assume will still update the nodecache capacity?
e.Ctx.ClusterNodeCache.PvcMapping.PutPod(podName, pvcs)
// if a pvcs is pending, remove the selected node in a goroutine
// so that to avoid ErrVolumeBindConflict(means the selected-node(on pvc)
// does not match the newly selected node by scheduler
for _, p := range pvcs {
if p.Status.Phase != corev1.ClaimPending {
return
}
}
Actually a pvc is subtracted multiple times.
And in UpdateNodeInfo
, could 'Requeted' be updated?
for _, vg := range unchangedVGs {
// update the size if the updatedName got extended
v := cacheNode.VGs[ResourceName(vg)]
v.Capacity = int64(vgMapInfo[vg].Allocatable)
cacheNode.VGs[ResourceName(vg)] = v
log.V(6).Infof("updating existing volume group %q(total:%d,allocatable:%d,used:%d) on node cache %s",
vg, vgMapInfo[vg].Total, vgMapInfo[vg].Allocatable, vgMapInfo[vg].Total-vgMapInfo[vg].Available, cacheNode.NodeName)
}
which version were you using?
on my local, i did not notice the assume logic triggered. to confirm this happened.
you can find logs about the pvc
if node == nil {
log.Infof("scheduling pvc %s without node", utils.GetName(pvc.ObjectMeta))
} else {
log.Infof("scheduling pvc %s on node %s", utils.GetName(pvc.ObjectMeta), node.Name)
}
which version were you using? 0.6.0
I0716 01:20:03.256697 1 routes.go:216] path: /apis/scheduling/:namespace/persistentvolumeclaims/:name, request body: I0716 01:20:03.256869 1 scheduling.go:41] scheduling pvc apgpt/apgpt-log on node xos-862dd221 I0716 01:20:03.256888 1 util.go:109] got pvc apgpt/apgpt-log as lvm pvc I0716 01:20:03.256898 1 common.go:426] storage class open-local-lvm has no parameter "vgName" set I0716 01:20:03.256920 1 cluster.go:167] assume node cache successfully: node = xos-862dd221, vg = xoslocal-open-local-lvm I0716 01:20:03.256924 1 cluster.go:96] node cache update I0716 01:20:03.256928 1 scheduling.go:119] allocatedUnits of pvc apgpt/apgpt-log: [{NodeName:xos-862dd221 VolumeType:LVM Requested:214748364800 Allocated:214748364800 VgName:xoslocal-open-local-lvm Device: MountPoint: PVCName:apgpt/apgpt-log}] ... I0716 01:20:04.980639 1 scheduling.go:41] scheduling pvc apgpt/apgpt-cloud-config on node xos-862dd221 I0716 01:20:04.980688 1 scheduling.go:64] pvc apgpt/apgpt-cloud-config is not eligible for provisioning as related pvcs are still pending E0716 01:20:04.980713 1 api_routes.go:61] failed to scheduling pvc apgpt/apgpt-cloud-config: pvc apgpt/apgpt-cloud-config is not eligible for provisioning as related pvcs are still pending I0716 01:20:04.980745 1 routes.go:218] path: /apis/scheduling/:namespace/persistentvolumeclaims/:name, code=500, response body=pvc apgpt/apgpt-cloud-config is not eligible for provisioning as related pvcs are still pending
normally,k8s should not trigger the scheduling process in this case, the only thing is to bind the pvc and pv. what's the k8s version?
normally,k8s should not trigger the scheduling process in this case, the only thing is to bind the pvc and pv. what's the k8s version?
1.25.16
per log, we can try add logic to check the pv bind status before assume
how do you think? maybe we can check the provisioner log to make sure above logic do works in this scenario.
Problem Description:
In a single-node environment, multiple pods in the same namespace share a PVC. It was discovered that Extender would try to update the nodecache's VG information in the
CapacityPredicate
during pod scheduling. For the first n pods, the capacity information was updated during scheduling, but it was suspected that the PVC had not been created. When the PVC was created, the available capacity in the nodecache was reduced, but Open Local still had to subtract the shared PVC's capacity from the VG, resulting in inconsistencies between the NLS's status and the nodecache's VGs. Even when trying to create again, Extender would inform that there was not enough capacity.I'm not sure if my usage is correct. So I have a few questions to confirm.
I have three questions:
Will pod scheduling be affected if Extender is not enabled during scheduling when there is only one node in the environment? I have encountered an "not eligible" error when trying to create a PVC.
Does open-local support having multiple pods on a single node sharing a PVC? K8s' description of RWO volumes:
However, I noticed that open-local does not implement the
NodeStageVolume
interface, so I'm not sure if this is possible. Additionally, there is an error on the agent side:The most important thing is that I want to know if open-local does not support multiple pods on the same node sharing access to a PVC. Even if it does not support it, the capacity in the nodecache should also be consistent with the
.status.nodeStorageInfo.volumeGroups
in the NLS.