FlaxEngine / FlaxEngine

Flax Engine – multi-platform 3D game engine
https://flaxengine.com
Other
5.81k stars 563 forks source link

Editor crashes on Vulkan VK_ERROR_DEVICE_LOST #1799

Open nothingTVatYT opened 1 year ago

nothingTVatYT commented 1 year ago

Issue description: Flax Editor sometimes crashes with VK_ERROR_DEVICE_LOST seemingly randomly. At least one time it was after exiting play mode.

Flax version: 1.7.6402 (Git master Oct 24th)

Log.txt

mafiesto4 commented 1 year ago

Crash from Debug build would contain more info.

Usually, VK_ERROR_DEVICE_LOST means that we're doing soo much GPU work or invalid GPU work but the log doesn't say anything here.

nothingTVatYT commented 1 year ago

I run a Debug build now and it hasn't happened in some hours of using the Flax Editor until I played around with the CSG functionality. Before it happened without a CSG brush so it shouldn't be related exclusively to CSG.

This is the latest master branch as of now (i.e. up to and including commit 2158fa7).

Log.txt

nothingTVatYT commented 12 months ago

Another crash and this time without involving CSG. This time it happened when I tried to enter play mode. Log.txt

mafiesto4 commented 12 months ago

I wonder if it's related to DDGI, as ti's pretty gpu-intense.

nothingTVatYT commented 12 months ago

Would a special debug version help getting to the root of the problem? I think like printing a stack trace or other details when the error occurs. I don't know enough about how it's supposed to work to do that myself but yesterday it happened roughly once an hour so I should be able to supply special logs. I thought about possible memory leaks but then it happened once right after I restarted the editor which makes that unlikely.

mafiesto4 commented 12 months ago

Maybe we need to integrate tool like NVIDIA Nsight Aftermath SDK (https://developer.nvidia.com/nsight-aftermath) to get full crash info from the GPU Driver.

nothingTVatYT commented 12 months ago

Whatever it takes to find the problem. It's quite annoying and disturbing to have this "oh, shit" moments wondering when I saved.

nothingTVatYT commented 12 months ago

This is installed on my PC and maybe it can help: https://archlinux.org/packages/extra/x86_64/nvidia-utils/ There is a tool called nvidia-debugdump.

 ╭─me@garuda1 in repo: FlaxEngine on  master [$!] via .NET v7.0.0 took 3ms
 ╰─λ nvidia-debugdump --help
-----------------------------------------------------
|   This is an external build of NvDebugDump        |
|                 Version 01.01                     |
-----------------------------------------------------
Usage: nvidia-debugdump [options]

Options include:

    [-l | --list]: 
        List all NVIDIA GPU devices in this computer

    [-d | --device]: 
        Device ID
        Devices are numbered starting at zero (0). Therefore,
        if you only have one GPU, you may either specify zero
        (0) for Device ID, or simply omit this option.

    [-f | --file]: 
        Input or output file name, such as: out.zip. If you
        omit this parameter, then a default file name of
        "dump.zip" will be used, in cases in which a file
        name is required (such as for dumping or decoding).

    [-v | --verbose]: 
        Print extra information while running the program

    [-z | --debug]: 
        Print LOTS of extra information.
        This is "debug levels of verbosity", and is intended
        mainly for programmers.

    [-h | --help]: 
        Print a detailed usage description and exit.
        This occurs regardless of the presence of any other
        options.

    [-V | --version]: 
        Print version information. May be combined with other
        options.

    [-D | --dumpall]: 
        Dump all components. Connect to the RM, and retrieve
        a complete diagnostic dump for the specified device.
        Note that this is done for ALL DEVICES in your system,
        unless you limit it to one device with the "--device"
        option.

    [-N | --nvlogonly]: 
        Dumps nvlog . Connects to the RM, and retrieves
        nvlog only.

    [-I | --ioctl]: 
        Use ioctl instead of NVML to retrieve nvlog
        from the driver.

Example: nvidia-debugdump --list
   (Lists all of the GPU devices on this computer)

Example: nvidia-debugdump --dumpall
   (Dumps the all components, in all GPU devices,
    to file dump.zip)

Example: nvidia-debugdump --dumpall --device 2 --file RunOne_dump.zip
   (Dumps the second GPU device, to file RunOne_dump.zip)
nothingTVatYT commented 11 months ago

It hasn't happened in a while but on importing an fbx and adapting the material the editor crashed again.

Version: git master as of Nov 17th

Log.txt

nothingTVatYT commented 11 months ago

And again, this time I could capture the screen when it happened which in itself is a piece of art. However maybe you could guess what the renderer is doing at that moment as it looks like a zoom with motion blur.

Log.txt Screenshot_Flax Editor 1 7 - '_home_me_Documents_flax-projects_test2'_3

nothingTVatYT commented 11 months ago

New log file after adding stack trace: Log.txt

nothingTVatYT commented 11 months ago

New log file with hopefully helpful details: Log.txt

nothingTVatYT commented 8 months ago

It's still there. Log.txt

Here is another log with a debug build. Log.txt