Project-HAMi / volcano-vgpu-device-plugin

Device-plugin for volcano vgpu which support hard resource isolation
Apache License 2.0
18 stars 7 forks source link

The vgpu-memory becomes ineffective after connecting to the container via SSH. #8

Closed AshinWu closed 1 month ago

AshinWu commented 1 month ago

For example, when I set the limit for my pod with volcano.sh/vgpu-memory: '1024', and then enter the container using kubectl exec, executing nvidia-smi shows that the GPU memory is indeed 1024. The environment variable CURA_DEVICE_MEMORY_LIMIT_0=1024m is set correctly.

However, when I connect to the container via SSH and execute nvidia-smi, the GPU memory shows the full capacity and is not controlled by vgpu-memory: '1024'. There is also no CURA_DEVICE_MEMORY_LIMIT_0 variable in the environment. What is going on?

AshinWu commented 1 month ago

To address this issue, the environment variable is lost due to SSH. To resolve this, we can append the following line to the /etc/profile file: export $(cat /proc/1/environ |tr '\0' '\n' | xargs). This will retrieve the environment variable from process 1 and set it to the container. Additionally, SSH connecting can be automatically executed by adding source /etc/profile.