superresolution.py with fp16 (half precision)

I've been trying to use the x4-upscaler model (https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) at half/fp16 precision with a script based on https://github.com/Stability-AI/stablediffusion/blob/main/scripts/gradio/superresolution.py. I'm using the weights from the fp16 branch of the model, and have made some changes to the script to try to get it to use half precision: added model.half() and replaced/added torch.float16 as the image and autocast data type. I'm also using the DDPM sampler instead of DDIM.

The script runs without errors but produces a black image. Adding some debug code shows that the output is all NaNs. Adding some debug code shows that the NaNs seem to appear during decoder upsampling, block 1 (i.e. second block).

Any ideas? I did see a note in the model config that seems to indicate this is a known issue (https://github.com/Stability-AI/stablediffusion/blob/main/configs/stable-diffusion/x4-upscaling.yaml#L56), is there no way to get the upscaler to work without full precision?

GPU: Nvidia L4 CUDA runtime: 11.8 pytorch: 2.0.0 or 2.0.1 (?) xformers: 0.0.19

Stability-AI / stablediffusion

superresolution.py with fp16 (half precision) #301