Open ProbablePrime opened 1 month ago
I don't think moving it under Linux will necessarily fix the stability. Usually it ends up crashing when there's an issue with handling certain assets - which is something we should catch and look at.
We probably need a mechanism that auto-restarts it after crash, regardless of whether it runs on Windows or Linux.
Having it run under Linux will definitely help with other things (like not needing Windows worker), but I just wanted to state that I don't think it'll help with stability.
Main thing that's blocking it from running on Linux are the Compressonator libraries. AMD ships prebuilt DLL's only for Windows, we'll need to build our own Linux versions: https://github.com/GPUOpen-Tools/compressonator
Apologies for the confusion.
By stability I'm referring to the Windows Machine itself. It keeps going offline, I don't know why and due to how its hosted its just basically a mess. Killing that Windows Machine will make things more reliable.
For the stability of the worker itself, yes of course, but that could be a separate issue. Right now the windows machine if it is up will automatically restart the worker.
We did have a script at one point to automatically restart the AVS under PowerShell ages ago - I can bring that over to shell as well while we attempt to catch things that are crashing it.
I need to take a look at the Compressonator stuff at some point for Sauce - I can bump that as a priority and get that on CI/CD for us if you guys would like.
This already happens, we use the Non-Sucking Service Manager to automatically restart it.(https://discord.com/channels/1040316820650991766/1154514014563483759/1219467220363644959)
Let's get this issue back on track, so yup Compressonator. Please don't re-prioritize anything though. I'll handle it.
Is your feature request related to a problem? Please describe.
Our asset worker, currently runs on Windows only. This creates a lot of issues as we need to run a separate fleet of Windows servers to run it.
We're experiencing stability issues with our windows based systems that run the worker and the easiest way to resolve those is to run it on Linux.
Describe the solution you'd like
Describe alternatives you've considered
Fixing our windows stuff.
I don't know what's wrong but it keeps becoming unhealthy.
Additional Context
No response
Requesters
No response