[Issue]: `migraphx.quantize_int8()` not performing quantization on GPU

ROCm / AMDMIGraphX

AMD's graph optimization engine.

https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/

MIT License

185 stars 86 forks source link

[Issue]: `migraphx.quantize_int8()` not performing quantization on GPU #3585

Closed Squidwarder closed 1 week ago

Squidwarder commented 2 weeks ago

Problem Description

I was trying to perform quantization on the Unet portion for SDXL using examples/diffusion/python_stable_diffusion_xl/txt2img.py, and while I obtained an output with migraphx.quantize_int8(), despite setting the target=migraphx.get_target("gpu") the quantization process didn't use the GPU at all (the entire process took around 40 minutes for a calibration set of 2 images).

This task was run on HPCFund.

Operating System

Rocky Linux

CPU

AMD EPYC 7763 64-Core Processor

GPU

AMD Instinct MI250

Other

No response

ROCm Version

ROCm 6.0.0

Steps to Reproduce

import migraphx as mgx
...

if use_fp16:
    print(f"Parsing unetxl to int8!!!")
    mgx.quantize_int8(model, mgx.get_target("gpu"), calibration_data)

model.compile(mgx.get_target("gpu"),
                          exhaustive_tune=exhaustive_tune,
                          offload_copy=offload_copy)

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

pfultz2 commented 1 week ago

(the entire process took around 40 minutes for a calibration set of 2 images).

How do you know it didnt run on the GPU? Is this including the compilation? To quantize it would compile the model for the GPU(which can take almost 20 minutes sometimes) to capture the data, it would add the quantization and compile the model again, which could be why it takes 40 minutes.

Squidwarder commented 1 week ago

When I was running the program with the code above, I was checking the status of the GPUs using watch rocm-smi. When the model was quantizing and compiling, I didn't observe any noticeable spike in the GPU status (the GPU columns were all at close to 0%), whereas I observed a lot of CPU usage with htop. Therefore, I think that the function call mgx.quantize_int8(model, mgx.get_target("gpu"), calibration_data) didn't actually use the GPUs despite specifying it in target.

tcgu-amd commented 1 week ago

Hi @Squidwarder, can your run with AMD_LOG_LEVEL=3? It can be hard to tell whether CPU or GPU is being used from htop and rocm-smi because compilation probably takes the majority of the runtime.

Squidwarder commented 1 week ago

Hello, after running with AMD_LOG_LEVEL=3 I did in fact notice a couple of very short spikes during the onnx -> mxr conversion process, so you're most likely right in that compiling took up too much time compared to quant_int8(). Thank you so much for your clarifications, I think the issue can be labeled as resolved.

The log itself is more than 1900 lines so I won't included it in here.