From the perspective of a user, many RAM OOM or VRAM OOM crashes are cryptic and the error seen in the main terminal window is non-specific and confusing. One or more of the following should occur instead:
The worker when crashing should poll VRAM/RAM conditions and when below a threshold, note to the user that memory/video memory may be a factor
When impossible memory conditions exist (e.g., 100mb VRAM free where flux is about to be loaded) the worker should warn that its very likely going to fail in its attempt to load it.
Perhaps job popping could be paused as well until the model is shown to have loaded successfully.
The worker should warn when VRAM -> RAM rollover (an option in the nvidia drivers for windows only at time of writing) kicks in
There is a possibility to detect this directly
More generally, extremely slow it/s (associated with this condition) should probably be explicitly noted as occurring, and advice given to free up VRAM/RAM.
From the perspective of a user, many RAM OOM or VRAM OOM crashes are cryptic and the error seen in the main terminal window is non-specific and confusing. One or more of the following should occur instead:
flux
is about to be loaded) the worker should warn that its very likely going to fail in its attempt to load it.