filipstrand / mflux

A MLX port of FLUX based on the Huggingface Diffusers implementation.
647 stars 43 forks source link

Load model from drive, note hf cache #19

Closed stefanvarunix closed 1 week ago

stefanvarunix commented 3 weeks ago

Hi, is it possible to load a model from drive (e.g. /name/models/FLUX.1-schnell) instead of hugginf face cache?

And by the way: thanks für mflux! really great software!

filipstrand commented 3 weeks ago

@stefanvarunix Hi and great that you are enjoying the project! This was actually the way I had it when I first started to develop the project as can be seen here. But after that I got some feedback to implement roughly how it was done in diffusers and load it from the cache, which is more standard and similar to how the Diffusers implementation work.

As it is right now, if you set the HF_HOME variable, you can point this to any other location. For example, if I set

export HF_HOME="/Users/filipstrand/Desktop/

and I move the model to

/Users/filipstrand/Desktop/hub/models--black-forest-labs--FLUX.1-schnell

then everything works. From what I understand, the model directory (for schnell in this case) needs to be named models--black-forest-labs--FLUX.1-schnell for this to work (naming it simply FLUX.1-schnell for example, does not work and the logic will start to download the model again).

I'll think about this some more, but we could for sure add something like:

Flux1.from_disk("/name/models/FLUX.1-schnell")

option, that would bypass the snapshot_download logic, and have it work essentially as before if the user chooses that option, but it would be a bit more "raw" since you would have to know what model parts are needed, in contrast to the snapshot_download logic which helps you with any missing models.

stefanvarunix commented 2 weeks ago

Thanks a lot! That works very well if I copy the previously downloaded model from the local huggingface cache (i.e. ~/.cache/huggingface/hub) to the new HF_HOME.

If I download directly (without the hf download during inferencing) and want to inference later, it does not work.

Example: I download:

from huggingface_hub import login
from huggingface_hub import snapshot_download
login(token = 'my_secret_token')
snapshot_download(repo_id="black-forest-labs/FLUX.1-dev", local_dir="/Users/stefan/Desktop/hub/models--black-forest-labs--FLUX.1-dev") 

/Users/stefan/Desktop/hub/models--black-forest-labs--FLUX.1-dev contains

...
LICENSE.md
README.md
ae.safetensors
dev_grid.jpg
flux1-dev.safetensors
model_index.json
scheduler
text_encoder
text_encoder_2
tokenizer
tokenizer_2
transformer
vae

This does not work. Even if I move all the files to another subfolder /blobs.

My visionary idea is: image to scale horizontally, i.e. have x Macs that run inferencing. Setting up these Macs by simply copying the original model puts less strain on the internet connection than downloading via huggingface on each device. And is easier to maintain, where one central machine has all the models in its original format to then distribute them to the other Macs. Without the need to initially download via the huggingface caching thing.

But that’s complaining at a high level. It works anyhow with the way you describe.

The more I play around with your software, the more I love it. A perfect mix of functionality and simplicity!

filipstrand commented 2 weeks ago

@stefanvarunix Hi. Again, thanks for your complements about the project :). I think your idea of running the model from disk without involving the Huggingface cache functionality is a good one. Recently I have added quantization support and with this I decided to rewrite the weight handling a bit and included this feature of loading from disk (for both quantized and non-quantized versions) to the project again. You can read about it here. Let me know if this works for you or if there is any problem.