Closed sudhamjayanthi closed 1 month ago
@sudhamjayanthi Good that you point this out and I should explain this more clearly in the README, but currently it works like this:
You first need to download the full non-quantised model (34 GB) and then, when you run mflux-save
, you export a local copy of the quantized model. The quantized model is thus never downloaded from an external source.
Once exported, you can choose to keep only the quantized model and point to it as shown here. But to reclaim the 34 GB disk space of the full model, you would have to manually delete the 34GB model from the Huggingface cache.
Of course, it would be nice if we could host the quantized weights somewhere so that if you only wanted the 4bit version, that was the only thing you had to download.
Yeah, guessed something like that was happening!
I'll push the quantised weights Monday if no one else does it by then, for the disk-poor.
Thanks! Please keep me posted if you do :)
https://huggingface.co/madroid/flux.1-schnell-mflux-4bit and https://huggingface.co/madroid/flux.1-dev-mflux-4bit/ Seems already available, 4-bit quantized models
Oh great, did not know about this! Will add this to the README. Might have to add support for downloading these automatically too.
haven't looked deeper into this issue yet but downloading the 4bit model requires more than 9GB for some reason.
used the following command
the full precision model has been downloaded, based on the size of the .cache folder, i think:
and then the quantised version is copied:
so it required 43GB of free space even for the quantised version. im unsure if this a huggingface or mflux thing too.
just flagging so people trying to download quantised version with low storage space aren't confused!