Random CUDA errors - Githubissues

That error message is often caused by an incompatible GPU driver on the machine and is usually solved by disabling Cuda Malloc. But that is disabled in Fooocus by default as far as I know.

The fact that it's appearing randomly and "fixes itself after a few hours" would suggest to be a problem with specific workers that are spawning for example when your normally used GPUs get low in availability.

I would suggest writing down what GPUs are your workers using normally (you can see what GPU it is when hovering over the rectangles representing individual workers in your endpoint details), and then checking what GPUs are being used when you encounter such an error. Alternatively, you could go over your list of secondary GPU selections and try to find the problematic one right away. But that could be time-consuming if you have many models selected since you need to change the endpoint settings, purge all the active workers to spawn new ones and test them.

I use basically just 4090s at this point (which are also the most cost-effective ones for this task) and never had such an error yet. So if you'll find how to reproduce it frequently or the GPU model that is causing this, let us know for sure.

davefojtik / RunPod-Fooocus-API

Random CUDA errors #36