pip install git+https://github.com/pytorch/ao.git
python cli_demo_quantization.py --prompt "A girl riding a bike." --model_path THUDM/CogVideoX-5b --quantization_scheme fp8 --dtype bfloat16
Expected behavior / 期待表现
does not work with torchao from current git tree anymore
Traceback (most recent call last):
File "/home/x/CogVideo/inference/cli_demo_quantization.py", line 26, in
from torchao.float8.inference import ActivationCasting, QuantConfig, quantize_to_float8
ImportError: cannot import name 'ActivationCasting' from 'torchao.float8.inference'
I'm not updated with all the changes in torchao at the moment but for now, I think it would be best to use the version of torchao before these modifications.
System Info / 系統信息
torch 2.4.1 / diffuser 0.30.2 / Ubuntu 22.04.4 LTS / Cuda driver 12.6
Information / 问题信息
Reproduction / 复现过程
Expected behavior / 期待表现
does not work with torchao from current git tree anymore
Traceback (most recent call last): File "/home/x/CogVideo/inference/cli_demo_quantization.py", line 26, in from torchao.float8.inference import ActivationCasting, QuantConfig, quantize_to_float8 ImportError: cannot import name 'ActivationCasting' from 'torchao.float8.inference'
it seems it got (re)moved here: https://github.com/pytorch/ao/commit/848e123e37df7e7033f26619b02562525404c2b5
Is there any up-to-date example of how to do inference with quantization?