Open guoreex opened 1 month ago
Could you test with the FP16/FP8 model and the default nodes w/o the custom node pack? Might be more appropriate for the ComfyUI repo if it still happens with those since the error makes it sound like it's not a problem with this node pack, I could be wrong though.
Also makes it sound like you can set the env variable export TOKENIZERS_PARALLELISM=false
to possibly fix it? Might be worth testing.
Thank you for your reply.
My computer only has 16G ram, which is not enough to run the FP8 model.
set export TOKENIZERS_PARALLELISM=false There are still mistakes:
...
Requested to load FluxClipModel_
Loading 1 new model
loaded completely 0.0 323.94775390625 True
Requested to load FluxClipModel_
Loading 1 new model
Requested to load Flux
Loading 1 new model
loaded completely 0.0 6456.9610595703125 True
0%| | 0/4 [00:00<?, ?it/s]/AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 63700992 bytes
'
/Users/***/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
The error occurs after starting the generation calculation.
Well, at least there's a progress bar now lol, buffer error is still there though...
I don't have any apple device to test on, but looks like there's a similar issue on the pytorch tracker with a linked PR, not sure if the cause is the same though. Might be worth keeping an eye on and testing on latest nightly once it gets merged? https://github.com/pytorch/pytorch/issues/136132
Still have the issue using today's nightly build. Any one else?
M2 Macbook Air, 16GB RAM Sequoia 15.0 Python version: 3.12.6 (main, Sep 6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)] pytorch version: 2.6.0.dev20240923 ComfyUI Revision: 2724 [3a0eeee3] | Released on '2024-09-23'
Requested to load Flux
Loading 1 new model
loaded completely 0.0 7867.7110595703125 True
0%| | 0/20 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 77856768 bytes
M2 Max Mac Studio, 64GB RAM Sequoia 15.0 Python 3.11.9
Only when running GGUF models (fp16 fp8 work fine)
/AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 77856768 bytes
<<Slight correction: flux1-dev-Q8_0.GGUF WORKS!!>> Correcting the correction: Q8 does not work (working test was before Sequoia)
M2 Max Mac Studio, 64GB RAM Sequoia 15.0 Python 3.11.9
Only when running GGUF models (fp16 fp8 work fine)
/AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 77856768 bytes
Slight correction: flux1-dev-Q8_0.GGUF WORKS!!
Does Q8 work? What PyTorch version are you using?
M2 Max Mac Studio, 64GB RAM Sequoia 15.0 Python 3.11.9 Only when running GGUF models (fp16 fp8 work fine) /AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 77856768 bytes Slight correction: flux1-dev-Q8_0.GGUF WORKS!!
Does Q8 work? What PyTorch version are you using?
I just retested Q8 and it does not work :( Working test was before Sequoia. Sorry for the false hope.
This is the only GGUF that I have found to work since Sequoia update:
https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-F16.gguf
Guys, I've tested torch==2.4.1 and it works for gguf Q8.
What is the mac config for your test? M? xxGB
Can't install pytorch==2.4.1 because it requires python < 3.9
Strange, I use python 3.11.
M1 Max, 32gb
I use 3.11 as well but the install of torch 2.4.1 failed due to python version. Very strange. I'll try again. Thanks.
Same issue here, flux GGUF's bail out with a mem allocation error in MPS (Error: buffer is not large enough. Must be 77856768 bytes). Worked on Mac OS 14.x but not anymore on Mac OS 15.x. same issue with torch 2.4.1 and 2.6.0.dev20240924 (nightly from last week). As reference and as I can run heavier flux (M3 Max, 128GB RAM), the direct flux models work fine. Would love to run GGUFs though due to less RAM and speed.
FINALLY!!! After 6 tries to get pytorch 2.4.1 to install, the install completed successfully. A simple test with a Q5 GGUF model and it did not abort comfyui. But, the image generated at an absolutely appauling 45 seconds per iteration.
It works but is not usable.
Thers something going on with every nightly build thats the issue for some reason the 2.6 nightlies all break the GGUF code for some reason running 32gb that works fine with Q8 on 2.4.1 fails every time with this semaphore error when moved to nightly.
I can't say if its seqouia + 2.6 nightlies but can confirm sequoia + 2.4.1 + gguf works fine, sequoia + 2.6 + gguf bails every time
This is super annoying because the 2.6 nightly finally added support for autocast on MPS
Thers something going on with every nightly build thats the issue for some reason the 2.6 nightlies all break the GGUF code for some reason running 32gb that works fine with Q8 on 2.4.1 fails every time with this semaphore error when moved to nightly.
I can't say if its seqouia + 2.6 nightlies but can confirm sequoia + 2.4.1 + gguf works fine, sequoia + 2.6 + gguf bails every time
This is super annoying because the 2.6 nightly finally added support for autocast on MPS
thank you bro! By using pytorch 2.4.1, It works again!
Thers something going on with every nightly build thats the issue for some reason the 2.6 nightlies all break the GGUF code for some reason running 32gb that works fine with Q8 on 2.4.1 fails every time with this semaphore error when moved to nightly.
I can't say if its seqouia + 2.6 nightlies but can confirm sequoia + 2.4.1 + gguf works fine, sequoia + 2.6 + gguf bails every time
This is super annoying because the 2.6 nightly finally added support for autocast on MPS
@city96 Hello Bro. I think this could be added to readme as a temporary fix guide.
@craii Added it under the installation section w/ a link to this issue thread.
appauling 45 seconds per iteration.
Just so you know i haven't tested them all Q8_0 on M3 and torch 2.4.1 i get ~16-17s/it ... on Q5 and Q8_4 (i've been playing with custom quants) and they are 40-50s/it its insane not sure why it's so bad, but ya, Q8_0 loads and runs fastest so far.
Q8 is faster because it can run fully on the GPU units, the others use a shift function that has to fallback to running on the CPU.
For example if Comfy is not hiding it in the terminal you should something see this
: The operator 'aten::__rshift__.Tensor' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)
when using the other models, this was taken from a Q6_K run in InvokeAI.
After Ptorch nightly 2.6.0 dev20241020 version, the problem has been fixed. I can run GGUF's quantized Flux.1 Dev Q4_0 version on my Mac book m1 Pro Memory: 16GB
After Ptorch nightly 2.6.0 dev20241020 version, the problem has been fixed. I can run GGUF's quantized Flux.1 Dev Q4_0 version on my Mac book m1 Pro Memory: 16GB
M2 Max 64GB after installing the 241020 nightly, GGUF seems to work again. Thanks for the heads up.
I also managed to get a GGUF working with pytorch 2.6.0.dev20241020 py3.10_0 pytorch-nightly
on Sequoia 15.0.1
.
conda install pytorch-nightly::pytorch torchvision torchaudio -c pytorch-nightly
Or simply conda install pytorch torchvision torchaudio -c pytorch-nightly
(https://developer.apple.com/metal/pytorch/)
M3 24GB works properly on Q4 schnell model after pytorch dev-20241020-nightly installed. But it seems to consume much more memory when given the same parameters to generate pictures(Now it takes 25~29gb while only 17~20gb was taken before)
Just use 2.4.1 not nightly, report the regression to PyTorch team they already fixed some of the other regressions
I'm not sure if this question is appropriate to ask here, I'm not a professional programmer, if anyone is willing to offer help and guidance, I would be very grateful.
Two weeks ago, I started using the GGUF model, and it can work normally. Today, I upgraded the system of the MacBook pro m1 computer to the latest version of macOS 15.0 (24A335). An error prompt occurred when running GGUF workflow in comfyUI:
My system information: Python version: 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ] pytorch version: 2.6.0.dev20240916 ComfyUI Revision: 2701 [7183fd16] | Released on '2024-09-17'
I didn't know if this is related to updating the system. thx