aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
101 stars 32 forks source link

Update hf_pretrained_sd15_512_inference.ipynb #29

Closed plienhar closed 10 months ago

plienhar commented 10 months ago

This pull request updates the the notebook that demonstrates how to compile and run the HuggingFace Stable Diffusion 1.5 (512x512) model for accelerated inference on Neuron (located in torch-neuronx/inference/hf_pretrained_sd15_512_inference.ipynb).

The main purpose of this update is to provide more context on the challenges of compiling and running the SD 1.5 pipeline on Neuron. Legacy code has been refactored. Latency numbers of both original and updated notebooks are comparable: ~2.4s/image with num_inference_steps=50.

Main updates:

Notice: The get_attention_scores we monkey-patch into the attention processor is actually key. Not monkey-patching increases latency dramatically. I was not able to totally figure out by myself why the provided implementation brings so much performance gains. Considering its importance, it would be greatly beneficial to provide additional context on the chosen implementation.