Apple MacBook Pro M1 Max. LLVM ERROR: Failed to infer result type(s).

achiever1984 commented 5 months ago

Hello. I installed everything, everything started, loaded from the example images of Isaac Newton, wrote Prompt: man img with huge dragon.

After clicking the Submit button, the program aborts with an error.

loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":233:0))): error: input types 'tensor<1x77x1x1xf16>' and 'tensor<1xf32>' are not broadcast compatible LLVM ERROR: Failed to infer result type(s). zsh: abort python3.10 gradio_demo/app.py /Users/vladimirkrutikov/.pyenv/versions/3.10.0/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

cckuailong commented 5 months ago

I think the error is caused by torch_dtype, we should keep its value to torch.float16 or torch.float32.

Please add these codes in the gardio_demo/app.py to show the values: print(sys.platform) -- expect output: darwin print(torch.backends.mps.is_available()) expect output: true

achiever1984 commented 5 months ago

I added the lines: print(sys.platform) print(torch.backends.mps.is_available())

After the lines: If device == "mps": torch_dtype = torch.float16 else: torch_dtype = torch.bfloat16

Result in console: darwin True

gcpdev commented 5 months ago

I can confirm the same happens in Apple M2 Max, iOS Ventura 13.6.2 (22G320).

I changed line #17 in app.py to torch_dtype = torch.float32 and got:

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction output = await route_utils.call_process_api( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api output = await app.get_blocks().process_api( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gradio/blocks.py", line 1561, in process_api result = await self.call_function( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gradio/blocks.py", line 1179, in call_function prediction = await anyio.to_thread.run_sync( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, args) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gradio/utils.py", line 678, in wrapper response = f(args, kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/gradio/utils.py", line 678, in wrapper response = f(*args, *kwargs) File "/Users/correa_publi/poc/PhotoMaker/gradio_demo/app.py", line 89, in generate_image images = pipe( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/photomaker/pipeline.py", line 375, in call prompt_embeds = self.id_encoder(id_pixel_values, prompt_embeds, class_tokens_mask) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/photomaker/model.py", line 107, in forward updated_prompt_embeds = self.fuse_module(prompt_embeds, id_embeds, class_tokens_mask) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/photomaker/model.py", line 85, in forward prompt_embeds.maskedscatter(class_tokens_mask[:, None], stacked_id_embeds.to(prompt_embeds.dtype)) NotImplementedError: The operator 'aten::maskedscatter' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Edit: Setting the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 seems to fix it, but it gets slow (395s to generate the output in my case), as pointed out in the warning.

cckuailong commented 5 months ago

PYTORCH_ENABLE_MPS_FALLBACK=1 settings makes you using CPU, not GPU. So it will be slow to generate image.

gcpdev commented 5 months ago

PYTORCH_ENABLE_MPS_FALLBACK=1 settings makes you using CPU, not GPU. So it will be slow to generate image.

I understand that, I just wanted to confirm that your hypothesis was valid, though the solution might not be that easy.

Vargol commented 5 months ago

PYTORCH_ENABLE_MPS_FALLBACK=1 settings makes you using CPU, not GPU. So it will be slow to generate image.

To Make things clear for any one unaware, PYTORCH_ENABLE_MPS_FALLBACK make the functions that are not supported in PyTorch for MPS run on the CPU, the rest of the functions that are supported still run on the GPU.

Vargol commented 5 months ago

Edit: Setting the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 seems to fix it, but it gets slow (395s to generate the output in my case), as pointed out in the warning.

How much memory have you got, asking as I've been playing with this on colab and it using ~13 GB VRAM and 5Gb System RAM , so if you've got a 16Gb Mac its likely to be swapping.

It could be the FALLBACK that's making it slow but that depend go how much 'aten::masked_scatter' and any other unsupported functions are used.

cckuailong commented 5 months ago

PYTORCH_ENABLE_MPS_FALLBACK=1 settings makes you using CPU, not GPU. So it will be slow to generate image.

To Make things clear for any one unaware, PYTORCH_ENABLE_MPS_FALLBACK make the functions that are not supported in PyTorch for MPS run on the CPU, the rest of the functions that are supported still run on the GPU.

Nice answer. Thank you!

gcpdev commented 5 months ago

Edit: Setting the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 seems to fix it, but it gets slow (395s to generate the output in my case), as pointed out in the warning.

How much memory have you got, asking as I've been playing with this on colab and it using ~13 GB VRAM and 5Gb System RAM , so if you've got a 16Gb Mac its likely to be swapping.

It could be the FALLBACK that's making it slow but that depend go how much 'aten::masked_scatter' and any other unsupported functions are used.

I've got 32Gb memory in my system, but when I run the process with the MPS_FALLBACK it consumes more than 10Gb of ram and starts swapping, so this might be the reason why it slows considerably...

ewatch commented 5 months ago

I could fix this with the following approach:

conda install pytorch torchvision torchaudio -c pytorch-nightly

Afterwards a library was missing named "chardet" so I did:

conda install chardet

Running on macOS Sonoma 14.3 Apple M2 Max 32 GB

and using the following variables:

device = "mps" torch_dtype = torch.float16

Afterwards some deprecation information is printed out when starting the gradio_demo ... however it runs:

UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node(

TencentARC / PhotoMaker

Apple MacBook Pro M1 Max. LLVM ERROR: Failed to infer result type(s). #78