[Performance]: Memory Usage Fix for gguf.

Abulhanan commented 2 weeks ago

Proposal to improve performance

Is there any way to first convert gguf model to pytorch then start the engine or ray-worker because when doing that, ray worker already uses 10gb ram and i'm left with 20gb of ram for converting, during conversion Ray crashes due to low ram, i'm using two gpus.

Report of performance regression

Is there any way to first convert the gguf model then start ray instance??

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python env.py`

Abulhanan commented 2 weeks ago

also how can i manually convert it?

sgsdxzy commented 2 weeks ago

Please read the doc here

Abulhanan commented 2 weeks ago

okay thank you very much.

PygmalionAI / aphrodite-engine