NVIDIA / k8s-dra-driver

Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes
Apache License 2.0
251 stars 47 forks source link

kubelet-plugin Error: Segmentation fault (core dumped) #6

Closed CoderTH closed 10 months ago

CoderTH commented 1 year ago

Os: centos7.9

image

I want to run the demo example in the code,when I ran the ./install-dra-driver.sh script, the kubelet-plugin pod could not be started. After troubleshooting, I found that the LD_LIBRARY_PATH setting was wrong, similar to this issue. #4

image

So I manually modified this path, and it seemed that this error was no longer reported, but at the same time, the pod was still restarting, and there were no related logs.

image image image

still error

image image

So I wanted to know what happened, so I manually modified the container run command and sleep for a while, so that I could manually check and run nvidia-dra-plugin, but the error still occurred.

image image

I still suspect that it is a problem with LD_LIBRARY_PATH. Because the previous setting was wrong, at least there was a log with the wrong path. After setting it correctly, there were no logs and the error kept reporting after restarting, so I manually set the wrong path.

image

Something magical happened, the pod ran successfully, and I was able to exec it into the container.

image

I manually exported LD_LIBRARY_PATH and then ran nvidia-dra-plugin and got the following error

I would like to ask, what is the problem and what happened?