Closed Paethon closed 1 year ago
Do you know which line the OOM happens at? One possible reason is that it happens during the weight download and weight conversion, so you got OOM even before FlexGen starts to work. https://github.com/FMInference/FlexGen/blob/f7b6869f11edc6dc96139fdce4d9521b352c2a6b/flexgen/opt_config.py#L121
In this case, you can rent a machine with more CPU ram and copy the converted weights.
Or you can implement a memory-efficient version of the weight conversion.
This should be fixed by #69. It is merged into the main branch. Could you try it now? Also, this is a duplication of #11
Sorry, was not available for a few days. Going to have a look, thanks! :+1:
I just tried running
flexgen.flex_opt
with the following command:python3 -m flexgen.flex_opt --model facebook/opt-66b --percent 0 0 100 0 100 0 --offload-dir ~/tmp/offload/
but this only fills the CPU memory until the process is killed by the OS and the folder~/tmp/offload/
stays completely emptySystem: