Running FLUX.1-schnell on Apple Silicon (MPS) without running into memory limitations?

nickyreinert commented 1 month ago

I am trying to find the correct setup to run it on M3/36GB memory, without success. Error message is either (running it via a Gradle UI (ref: pictero.com):

UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

or this (running on Jupyter):

Disposing session as kernel process died ExitCode: undefined, Reason: [NO REASON PROVIDED]

This is my pipeline config:

device = "mps"
data_type = torch.float32
torch.backends.cuda.matmul.allow_tf32 = False
variant = None
use_safetensors = True
pipeline = FluxPipeline.from_pretrained(
            "black-forest-labs/FLUX.1-schnell", 
            use_safetensors=use_safetensors, 
            torch_dtype=data_type, 
            variant=variant).to(device)
pipeline.requires_safety_checker = False
pipeline.safety_checker = None
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)
manual_seed = 42
generator = torch.manual_seed(manual_seed)
prompt = "A white rabbit   "
negative_prompt = ""
inference_steps = 3
guidance_scale = 5
image = pipeline(
        prompt=prompt,
        # negative_prompt=negative_prompt, # not supported?
        generator=generator,
        num_inference_steps=inference_steps,
        # cross_attention_kwargs=None, # not supported?
        # guidance_scale=guidance_scale # not supported?
).images

image[0]

nickyreinert commented 1 month ago

When manually assembling the pipeline I can narrow it down to the transformer:

transformer = FluxTransformer2DModel.from_pretrained(bfl_repo, subfolder="transformer", torch_dtype=dtype, revision=revision).to('mps')

Which kind of makes sense, they are about 24 GByte, which most probably leads to the memory exception:

https://huggingface.co/black-forest-labs/FLUX.1-schnell/tree/main/transformer

filipstrand commented 1 month ago

@nickyreinert I have recently released MFLUX which can currently run the Schnell model on Apple Silicon using their new MLX framework. With 36GB of memory it should work fine (I have personally tested it on my 32GB machine, but others have gotten it to work with 16GB also)

nickyreinert commented 1 month ago

@filipstrand Working like a charm! Comparison measurement: M3 Pro /w 36 GB takes:

real 1m43.003s user 0m12.328s sys 0m48.153s

skfrost19 commented 1 day ago

is pipeline.safety_checker is working for you?

black-forest-labs / flux

Running FLUX.1-schnell on Apple Silicon (MPS) without running into memory limitations? #80