Open JingyaHuang opened 8 months ago
Also tried with PyTorch 2.1.2 setup, not working neither.
Hi Jingya, I'm trying to reproduce the problem. I installed optimum and optimum-neuron with
pip install "optimum[neuronx, diffusers]"
based on https://huggingface.co/docs/optimum-neuron/tutorials/stable_diffusion.
However this seems to get v0.0.3 which doesn't find NeuronStableDiffusionXLPipeline. I also tried downgrading to 0.0.2 which has another problem. Is this expected with these versions, and is there a way to get 0.021? Thanks.
The installation with neuronx extra is what we are going to fix with the 0.0.21 optimum-neuron release. Fow now, to install the latest optimum-neuron release(0.0.20), could you try with:
pip install optimum==1.18.0
pip install optimum-neuron==0.0.20
Or the 0.0.21 dev version can be installed from source:
pip install git+https://github.com/huggingface/optimum-neuron
Then you could install pip install diffusers.
Thanks Jingya, I updated optimum-neuron and diffusers and now I can reproduce the issue.
Hi Jingya, I found that the issue can be prevented if we set inline_weights_to_neff=True when tracing the UNet. Would that be a sufficient workaround for now? I will also look into the root cause but that may take some time.
Hi @aws-bhegedus, thanks for investigating it!
Optimum Neuron could force setting inline_weights_to_neff=False
for sdxl models for now. But given that our caching mechanism relies on the neff weights separation, we won't be able to cache and load sdxl models (which takes time for the compilation).
Thanks Jingya, we will have a fix in a future release to allow enabling the caching. Is this problem only there for SDXL-random-tiny? Curious about SDXL-base, which I believe is larger and takes longer to compile so may be a bigger problem.
Thanks @aws-bhegedus, that will be awesome!
tiny-random-stable-diffusion-xl
is a smaller version (fewer layers) of sdxl models in the pipe with random weights that we built to shorten the testing time, if the compilation fails for the tiny version, it's very unlikely that it could work for the larger pretrained checkpoint. And since the compilation of all sdxl components takes more than an hour, without the caching it could be a little bit discouraging for first-time users.
The issue still exist with the latest neuron SDK 2.19.1:
***** Compiling unet *****
Using Neuron: --auto-cast matmul
Using Neuron: --auto-cast-type bf16
...
[NLA001] Unhandled exception with message: === BIR verification failed ===
Reason: Pattern accesses 48 (> 32) partitions starting at partition 32
Instruction: I-29178
Opcode: GenericCopy
Instruction Source: (|V2<48 x 1> $29178:29178)0:
Output index: 0
Argument AP:
Access Pattern: [[1,48],[1,1],[1,1]]
SymbolicAP
Memory Location: {concatenate.3_set}@SB - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
An error occured when trying to trace unet with the error message: neuronx-cc failed with 70.
The export is failed and unet neuron model won't be stored.
***** Compiling vae_encoder *****
Using Neuron: --auto-cast matmul
Using Neuron: --auto-cast-type bf16
.
Compiler status PASS
[Compilation Time] 8.55 seconds.
***** Compiling vae_decoder *****
Using Neuron: --auto-cast matmul
Using Neuron: --auto-cast-type bf16
.
Compiler status PASS
[Compilation Time] 8.27 seconds.
[Total compilation Time] 38.63 seconds.
Traceback (most recent call last):
File "test_non_inline.py", line 11, in <module>
stable_diffusion = NeuronStableDiffusionXLPipeline.from_pretrained(
File "/home/ubuntu/pyvenv/aws_neuron_venv_2.19.1/lib/python3.8/site-packages/optimum/modeling_base.py", line 402, in from_pretrained
return from_pretrained_method(
File "/home/ubuntu/optimum-neuron/optimum/neuron/utils/require_utils.py", line 51, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/optimum-neuron/optimum/neuron/modeling_diffusion.py", line 714, in _from_transformers
return cls._export(*args, **kwargs)
File "/home/ubuntu/optimum-neuron/optimum/neuron/utils/require_utils.py", line 51, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/optimum-neuron/optimum/neuron/modeling_diffusion.py", line 954, in _export
return cls._from_pretrained(
File "/home/ubuntu/optimum-neuron/optimum/neuron/utils/require_utils.py", line 51, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/optimum-neuron/optimum/neuron/modeling_diffusion.py", line 670, in _from_pretrained
data_parallel_mode = cls.set_default_dp_mode(configs["unet"])
KeyError: 'unet'
Thanks for testing @JingyaHuang. I was able to reproduce the issue again and we are looking into it.
Hi team, when trying to bump Optimum Neuron to the latest Neuron sdk 2.18 release, we notice that the compilation of unet for SDXL model failed with the latest compiler. Here are more details about the regression:
Error log
``` === BIR verification failed === Reason: Pattern accesses 48 (> 32) partitions starting at partition 32 Instruction: I-36948 Opcode: GenericCopy Output index: 0 Argument AP: Access Pattern: [[1,48],[1,1],[1,1]] SymbolicAP Memory Location: {concatenate.3_set}@SB 2024-04-03T09:11:19Z 2024-04-03T09:11:19Z Diagnostic information: 2024-04-03T09:11:19Z NeuronX Compiler version 2.13.66.0+6dfecc895 2024-04-03T09:11:19Z 2024-04-03T09:11:19Z Python version 3.8.10 2024-04-03T09:11:19Z HWM version 2.13.66.0+6dfecc895 2024-04-03T09:11:19Z NumPy version 1.24.4 2024-04-03T09:11:19Z 2024-04-03T09:11:19Z Running on AMI ami-09cd747c78a9add63 2024-04-03T09:11:19Z Running in region use1-az6 2024-04-03T09:11:19Z 2024-04-03T09:11:19Z Diagnostic logs stored in /home/ubuntu/optimum-neuron/log-neuron-cc.txt An error occured when trying to trace unet with the error message: neuronx-cc failed with 70. The export is failed and unet neuron model won't be stored. ```The test above works as expected with Neuron SDK 2.17.1.