Create a sharegpu instance of a large image concurrently, and delete some sharegpu instances when the image is pulled, which will cause the sharegpu instance creation to fail.
What you expected to happen:
Create a sharegpu instance of a large image concurrently, and delete some sharegpu instances when the image is pulled, which other sharegpu instance creation success.
How to reproduce it (as minimally and precisely as possible):
Create a sharegpu instance of a large image concurrently.
And delete some sharegpu instances when the image is pulled.
Wait for the image to be pulled, and sharegpu instance creation will fail.
[ debug ] 2020/06/17 09:54:43 gpushare-predicate.go:17: check if the pod name k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl can be scheduled on node ser-330
[ debug ] 2020/06/17 09:54:43 gpushare-predicate.go:31: The pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in the namespace k8s-common-ns can be scheduled on ser-330
[ debug ] 2020/06/17 09:54:43 routes.go:121: gpusharingBind ExtenderArgs ={k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl k8s-common-ns 90fddd7e-b080-11ea-9b44-0cc47ab32cea ser-330}
[ debug ] 2020/06/17 09:54:43 nodeinfo.go:143: Allocate() ----Begin to allocate GPU for gpu mem for pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns----
[ debug ] 2020/06/17 09:54:43 nodeinfo.go:220: reqGPU for pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns: 8
[ debug ] 2020/06/17 09:54:43 nodeinfo.go:239: Find candidate dev id 1 for pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns successfully.
[ debug ] 2020/06/17 09:54:43 nodeinfo.go:147: Allocate() 1. Allocate GPU ID 1 to pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns.----
[ info ] 2020/06/17 09:54:43 controller.go:286: Need to update pod name k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns and old status is Pending, new status is Pending; its old annotation map[] and new annotation map[ALIYUN_COM_GPU_MEM_IDX:1 ALIYUN_COM_GPU_MEM_POD:8 ALIYUN_COM_GPU_MEM_ASSIGNED:false ALIYUN_COM_GPU_MEM_ASSUME_TIME:1592387683318737367 ALIYUN_COM_GPU_MEM_DEV:24]
[ debug ] 2020/06/17 09:54:43 nodeinfo.go:179: Allocate() 2. Try to bind pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in k8s-common-ns namespace to node with &Binding{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl,GenerateName:,Namespace:,SelfLink:,UID:90fddd7e-b080-11ea-9b44-0cc47ab32cea,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Target:ObjectReference{Kind:Node,Namespace:,Name:ser-330,UID:,APIVersion:,ResourceVersion:,FieldPath:,},}
[ debug ] 2020/06/17 09:54:43 nodeinfo.go:193: Allocate() 3. Try to add pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns to dev 1
[ debug ] 2020/06/17 09:54:43 deviceinfo.go:57: dev.addPod() Pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns with the GPU ID 1 will be added to device map
[ debug ] 2020/06/17 09:54:43 nodeinfo.go:204: Allocate() ----End to allocate GPU for gpu mem for pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns----
gpushare device plugin log
I0617 10:04:50.278017 1 podmanager.go:123] list pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns in node ser-330 and status is Pending
I0617 10:04:50.278039 1 podutils.go:91] Found GPUSharedAssumed assumed pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in namespace k8s-common-ns.
I0617 10:04:50.278046 1 podmanager.go:157] candidate pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns with timestamp 1592387683318737367 is found.
I0617 10:04:50.278056 1 allocate.go:70] Pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns request GPU Memory 8 with timestamp 1592387683318737367
I0617 10:04:50.278064 1 allocate.go:80] Found Assumed GPU shared Pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns with GPU Memory 8
I0617 10:04:50.354408 1 podmanager.go:123] list pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl in ns k8s-common-ns in node ser-330 and status is Pending
I0617 10:04:50.354423 1 podutils.go:96] GPU assigned Flag for pod k8s-deploy-ubhqko-1592387682017-7875f9fc5c-b6pxl exists in namespace k8s-common-ns and its assigned status is true, so it's not GPUSharedAssumed assumed pod.
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):cat /etc/os-release
):CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"