Open clearsitedesigns opened 2 months ago
Did you get this to work on mps, or did you end up sticking with running on cpu?
I got it working - but it still runs slow compared to, say, focus, which I can do 100 images an hour; this was doing 1 really bad pixelated image after 20 samplings. I tried to run 50 but couldn't get it to get there. To change this to work with MPS. I think I would have to spend more time, but with the processing speed so slow not sure its worth it.
Right now I'm testing this on an A100 colab, I'll tell ya the results.
Ah that is really a shame. I just went through and hunted down the issue in math.py and switched it to float32, trying to run a generation on mps now...doesn't seem to be much faster than running on cpu so far.
The resulting images using cpu were pretty solid, but definitely super slow compared to other locally-run models I've used for a not-so-much gain in the end result. Something like 20 minutes on the first image, probably due to downloading the tensors from HF, and about 4-6 minutes on subsequent runs.
Yeah, I even just ran it on a collab A100 and the streamlet fails - Downloads model, but then get a tensor issue - I know what it its, but I would have to upload and fix my version to cast them the same: type so they don't swap back and forth between float32
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
I'll try again after beating my head on the wall.
I thin the streamlit has bugs in it, that are making it worse.. - trying gradio as well.
@clearsitedesigns @vermi As an alternative to running the official repo I have recently made a port of the diffusers implementation to Apple's new MLX framework called MFLUX. Right now, we only have support for the Schnell model, but it should run relatively quickly. On an M1 MacBook Pro 32GB, it takes between 1 and 3 minutes for me per image (1024,1024 resolution and 2 time steps) but others have reported much faster times on faster machines with more memory.
Should we expect anything on a Mac M1 to output using the install steps in this repo?
Using:
python demo_gr.py --name flux-dev
# and
python demo_gr.py --name flux-dev --offload
Seems to run indefinitely generating just a 128x128 image:
Am I expecting too much of my macbook to use flux-dev?
I notice in activity monitor that it appears the GPU is not being used at all.
I have tried @filipstrand 's MFLUX which seems to work fine!
@joepagan Thanks for giving MFLUX a try! We have a branch which is soon ready to be merged which adds support for the dev model too. It definitely runs, but generation times are definitely slower with my initial testing compared to the Schnell modell.
I tried running this on a few computers. I got it working on my Mac after changing the MPS. I also created a collab to get this up and running. This doesn't seem viable for anything lower than a 4090. The generation speed needs to be faster to be considered ready to use.
For those who are exploring this:
I'll save you the headache of reading the installation. I had every known challenge to get this run so see if I can save you some time trying to get this to work, so there must have been some magical way. At any rate, this is how I got it to run locally with conda.
Step 1: Install Python 3.10+ using Conda if not already installed. If you don't have Conda installed, you can install it via Anaconda or Miniconda: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
Step 2: Create a new Conda environment with Python 3.10+ conda create -n flux-env python=3.10
Step 3: Activate the Conda environment conda activate flux-env
Step 4: Clone the GitHub repository git clone https://github.com/black-forest-labs/flux cd flux
Step 5: Install the package and dependencies using pip within the Conda environment pip install -e '.[all]'
Step6: Make sure to get the gateway model un gated on hugginface. https://huggingface.co/black-forest-labs/FLUX.1-schnell
Step 7: Run the Streamlit app streamlit run demo_st.py
Adust the samples - generating a single image takes a very long time. I think there is memory leak somewhere.