Is there any way to first convert gguf model to pytorch then start the engine or ray-worker because when doing that, ray worker already uses 10gb ram and i'm left with 20gb of ram for converting, during conversion Ray crashes due to low ram, i'm using two gpus.
Report of performance regression
Is there any way to first convert the gguf model then start ray instance??
Misc discussion on performance
No response
Your current environment (if you think it is necessary)
Proposal to improve performance
Is there any way to first convert gguf model to pytorch then start the engine or ray-worker because when doing that, ray worker already uses 10gb ram and i'm left with 20gb of ram for converting, during conversion Ray crashes due to low ram, i'm using two gpus.
Report of performance regression
Is there any way to first convert the gguf model then start ray instance??
Misc discussion on performance
No response
Your current environment (if you think it is necessary)