amd / RyzenAI-SW

MIT License
348 stars 55 forks source link

Error during YOLOv8s quantization with Ryzen AI quantizer (RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn) #122

Open Siva50005 opened 9 hours ago

Siva50005 commented 9 hours ago

I encountered an issue while trying to quantize the YOLOv8s model using the Ryzen AI quantizer. Below are the details of the error:

Error Message:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
[VAIQ_WARN][QUANTIZER_TORCH_CUDA_UNAVAILABLE]: CUDA (HIP) is not available, change device to CPU.
...
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Environment:

Steps to Reproduce:

  1. Attempt to quantize the YOLOv8s model using the Ryzen AI quantizer.
  2. Use the command:
    python new_quant.py

new_quant.py:

import torch
import torch.nn as nn

import os
from PIL import Image
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
from pytorch_nndct.apis import torch_quantizer, dump_xmodel
from ultralytics import YOLO

# Custom Dataset for Calibration
class CalibrationDataset(Dataset):
    def __init__(self, image_dir, transform=None):
        self.image_dir = image_dir
        self.image_paths = [os.path.join(image_dir, img) for img in os.listdir(image_dir)]
        self.transform = transform

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert('RGB')
        if self.transform:
            image = self.transform(image)
        return image, 0  # Return a dummy label

# Define your image transformations
transform = transforms.Compose([
    transforms.Resize((640, 640)),  # Adjust size as needed
    transforms.ToTensor(),
])

# Create DataLoader for Calibration Dataset
image_dir = '/workspace/pytorch_example_2/datasets/coco/images/val2017'  # Replace with your calibration dataset path
calib_dataset = CalibrationDataset(image_dir, transform=transform)
calib_loader = DataLoader(calib_dataset, batch_size=1, shuffle=False)

# Load pre-trained model
model = YOLO("yolov8s.pt")

# Generate a quantizer and convert the model
batch_size = 1  # Adjust based on your requirements
input = torch.randn([batch_size, 3, 640, 640])  # Adjust input size as needed
quant_mode = 'calib'  # Use 'calib' for calibration, 'test' for testing

quantizer = torch_quantizer(quant_mode, model, (input), quant_config_file=None)
quant_model = quantizer.quant_model

# Calibrate the model
def calibrate_model(model, calib_loader):
    model.eval()
    with torch.no_grad():  # Disable gradient calculation
        for images, _ in calib_loader:
            model(images)

calibrate_model(quant_model, calib_loader)

# Evaluate the quantized model
def evaluate(model, val_loader, loss_fn):
    model.eval()
    acc1_gen, acc5_gen, loss_gen = 0, 0, 0
    with torch.no_grad():  # Disable gradient calculation
        for images, labels in val_loader:
            outputs = model(images)
            loss = loss_fn(outputs, labels)
            # Calculate accuracy and loss
            # acc1_gen, acc5_gen, loss_gen = ... (implement your evaluation logic here)
    return acc1_gen, acc5_gen, loss_gen

# Assuming val_loader and loss_fn are defined elsewhere
# acc1_gen, acc5_gen, loss_gen = evaluate(quant_model, val_loader, loss_fn)

# Export the quantization results and model
quantizer.export_quant_config()
quantizer.export_onnx_model()

Additional Information:

Expected Behavior:

The quantization process should complete successfully without raising a RuntimeError.

fanz-xlnx commented 8 hours ago

Hi @Siva50005, Thanks for your valuable feedback.

The quantization of Yolov8 needs QAT process to maintain the accuracy which is excluded from the tutorial. The QAT process is computationally demanding and time consuming even with GPU enabled. The runtime error is not expected on CPU only but It will cost the user several weeks to compute which is not a realistic use case from our point of view.

In my humble opinion, the issue could result from the virtual env like WSL. Do you have a real Linux env to run the quantizer? We have provided the docker image below to simplify the installation process for a quick validation. ryzen-ai-pytorch-docker

Siva50005 commented 6 hours ago

Hi @Siva50005, Thanks for your valuable feedback.

The quantization of Yolov8 needs QAT process to maintain the accuracy which is excluded from the tutorial. The QAT process is computationally demanding and time consuming even with GPU enabled. The runtime error is not expected on CPU only but It will cost the user several weeks to compute which is not a realistic use case from our point of view.

In my humble opinion, the issue could result from the virtual env like WSL. Do you have a real Linux env to run the quantizer? We have provided the docker image below to simplify the installation process for a quick validation. ryzen-ai-pytorch-docker

Thank you for your prompt response. I agree that the QAT process can be computationally expensive, especially when maintaining accuracy. However, the runtime error I encountered was unexpected, and I appreciate your point regarding the use of virtual environments like WSL potentially being a factor.

I used the docker image which was provided in the pytorch quantization tutorial. docker image used: docker pull xilinx/vitis-ai-pytorch-cpu:latest

Currently, I am running the quantizer within a WSL environment on my system, and it’s possible that this could be contributing to the issue. I will try running the provided docker image directly on windows/linux environment. I’ll reach out if I run into any further issues. Thank you.

Siva50005 commented 5 hours ago

Hi @Siva50005, Thanks for your valuable feedback.

The quantization of Yolov8 needs QAT process to maintain the accuracy which is excluded from the tutorial. The QAT process is computationally demanding and time consuming even with GPU enabled. The runtime error is not expected on CPU only but It will cost the user several weeks to compute which is not a realistic use case from our point of view.

In my humble opinion, the issue could result from the virtual env like WSL. Do you have a real Linux env to run the quantizer? We have provided the docker image below to simplify the installation process for a quick validation. ryzen-ai-pytorch-docker

I have been stuck at quantizing this yolov8 model for the past couple of days. If possible can you provide me some concrete steps to quantize it without QAT which is really time consuming? Possibly a PTQ would do

Thanks in advance

Best regards

fanz-xlnx commented 3 hours ago

Hi @Siva50005, I hope the script below could help. https://github.com/amd/RyzenAI-SW/blob/1.1/tutorial/yolov8_e2e/run_ptq.sh