4bit model requires more than 9GB

filipstrand / mflux

A MLX port of FLUX based on the Huggingface Diffusers implementation.

MIT License

910 stars 55 forks source link

4bit model requires more than 9GB #36

Closed sudhamjayanthi closed 1 month ago

sudhamjayanthi commented 1 month ago

haven't looked deeper into this issue yet but downloading the 4bit model requires more than 9GB for some reason.

used the following command

mflux-save \
    --path "/Users/sdm/Desktop/code/schnell_4bit" \
    --model schnell \
    --quantize 4

the full precision model has been downloaded, based on the size of the .cache folder, i think:

and then the quantised version is copied:

so it required 43GB of free space even for the quantised version. im unsure if this a huggingface or mflux thing too.

just flagging so people trying to download quantised version with low storage space aren't confused!

filipstrand commented 1 month ago

@sudhamjayanthi Good that you point this out and I should explain this more clearly in the README, but currently it works like this:

You first need to download the full non-quantised model (34 GB) and then, when you run mflux-save, you export a local copy of the quantized model. The quantized model is thus never downloaded from an external source.

Once exported, you can choose to keep only the quantized model and point to it as shown here. But to reclaim the 34 GB disk space of the full model, you would have to manually delete the 34GB model from the Huggingface cache.

Of course, it would be nice if we could host the quantized weights somewhere so that if you only wanted the 4bit version, that was the only thing you had to download.

sudhamjayanthi commented 1 month ago

Yeah, guessed something like that was happening!

I'll push the quantised weights Monday if no one else does it by then, for the disk-poor.

filipstrand commented 1 month ago

Thanks! Please keep me posted if you do :)

elitexp commented 1 month ago

https://huggingface.co/madroid/flux.1-schnell-mflux-4bit and https://huggingface.co/madroid/flux.1-dev-mflux-4bit/ Seems already available, 4-bit quantized models

filipstrand commented 1 month ago

Oh great, did not know about this! Will add this to the README. Might have to add support for downloading these automatically too.