vulkaninfo Segmentation fault (core dumped) #1026

Closed brian2lee closed 6 days ago

brian2lee commented 1 week ago

I was going to try maniskill, following the instruction to install vulkan, while vulkaninfo showed Segmentation fault (core dumped). Followed the troubleshooting, /usr/share/vulkan/icd.d/nvidia_icd.json & /usr/share/glvnd/egl_vendor.d/10_nvidia.json exist. Have no idea what's wrong. Posted same issue on maniskill. Environment:

ubuntu 20.04
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   49C    P0             26W /   80W |      15MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |

| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|    0   N/A  N/A      1318      G   /usr/lib/xorg/Xorg                              4MiB |

ldconfig -p | grep libGLX_nvidia (libc6,x86-64) => /lib/x86_64-linux-gnu/


    "file_format_version" : "1.0.0",
    "ICD" : {
        "library_path" : ""


    "file_format_version" : "1.0.1",
    "ICD": {
        "library_path": "",
        "api_version" : "1.3.280"

And just in case here's my pip3 list:

charles-lunarg commented 1 week ago

I believe this is actually a bug during shutdown of Nvidia's Vulkan Safety critical driver. If I install the nvidia-driver-560 and run vulkaninfo, I get the output I expect followed by segmentation fault ( core dumped). What makes me think its the driver's fault is that I get the same output from vkcube and the crash occurs after vkDestroyInstance successfully returned (I did a debug build to confirm this).

This may be the same issue as #1025 as both have nvidia driver 560.

brian2lee commented 6 days ago

@charles-lunarg yeah somehow I reinstall my driver to nvidia 470 it works, at least for vulkaninfo. Just got into this thing, got to wait and see if any furthur issues appears.

StoneT2000 commented 5 days ago

@charles-lunarg thanks for helping debug this issue. We had a few other users of our vulkan based rendering software have the same issue and couldn't figure it out so I pointed them here. Is there a reliable way to debug driver issues with seg faults? E.g. if there is a new nvidia driver, should we always recommend people to downgrade the nvidia driver?

charles-lunarg commented 5 days ago

Is there a reliable way to debug driver issues with seg faults?

It is much easier when the fault occurs inside a driver rather than after main returns. Because then you'd have a stack trace pointing to something.

I personally believe this is just the teething problems of Vulkan Safety critical being shipped for the first time - the driver is installed but is erroneously being used by the regular Vulkan-Loader. There may be a way for the loader to detect this situation and prevent loading of the safety critical driver, but since its already shipped thats a moot point as fixing the driver is the best action. I have gone ahead and contacted the Nvidia developer I know who works on drivers.

So for the time being I would recommend downgrading.