Error during YOLOv8s quantization with Ryzen AI quantizer (RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn)

Siva50005 commented 2 months ago

I encountered an issue while trying to quantize the YOLOv8s model using the Ryzen AI quantizer. Below are the details of the error:

Error Message:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
[VAIQ_WARN][QUANTIZER_TORCH_CUDA_UNAVAILABLE]: CUDA (HIP) is not available, change device to CPU.
...
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Environment:

OS: Linux (WSL2)
Python version: 3.8.6
PyTorch version: 1.13.1
VAIQ version: 3.5.0+60df3f1+torch1.13.1
CUDA: Not available

Steps to Reproduce:

Attempt to quantize the YOLOv8s model using the Ryzen AI quantizer.
Use the command:
```
python new_quant.py
```

new_quant.py:

import torch
import torch.nn as nn

import os
from PIL import Image
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
from pytorch_nndct.apis import torch_quantizer, dump_xmodel
from ultralytics import YOLO

# Custom Dataset for Calibration
class CalibrationDataset(Dataset):
    def __init__(self, image_dir, transform=None):
        self.image_dir = image_dir
        self.image_paths = [os.path.join(image_dir, img) for img in os.listdir(image_dir)]
        self.transform = transform

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert('RGB')
        if self.transform:
            image = self.transform(image)
        return image, 0  # Return a dummy label

# Define your image transformations
transform = transforms.Compose([
    transforms.Resize((640, 640)),  # Adjust size as needed
    transforms.ToTensor(),
])

# Create DataLoader for Calibration Dataset
image_dir = '/workspace/pytorch_example_2/datasets/coco/images/val2017'  # Replace with your calibration dataset path
calib_dataset = CalibrationDataset(image_dir, transform=transform)
calib_loader = DataLoader(calib_dataset, batch_size=1, shuffle=False)

# Load pre-trained model
model = YOLO("yolov8s.pt")

# Generate a quantizer and convert the model
batch_size = 1  # Adjust based on your requirements
input = torch.randn([batch_size, 3, 640, 640])  # Adjust input size as needed
quant_mode = 'calib'  # Use 'calib' for calibration, 'test' for testing

quantizer = torch_quantizer(quant_mode, model, (input), quant_config_file=None)
quant_model = quantizer.quant_model

# Calibrate the model
def calibrate_model(model, calib_loader):
    model.eval()
    with torch.no_grad():  # Disable gradient calculation
        for images, _ in calib_loader:
            model(images)

calibrate_model(quant_model, calib_loader)

# Evaluate the quantized model
def evaluate(model, val_loader, loss_fn):
    model.eval()
    acc1_gen, acc5_gen, loss_gen = 0, 0, 0
    with torch.no_grad():  # Disable gradient calculation
        for images, labels in val_loader:
            outputs = model(images)
            loss = loss_fn(outputs, labels)
            # Calculate accuracy and loss
            # acc1_gen, acc5_gen, loss_gen = ... (implement your evaluation logic here)
    return acc1_gen, acc5_gen, loss_gen

# Assuming val_loader and loss_fn are defined elsewhere
# acc1_gen, acc5_gen, loss_gen = evaluate(quant_model, val_loader, loss_fn)

# Export the quantization results and model
quantizer.export_quant_config()
quantizer.export_onnx_model()

Additional Information:

I dont have CUDA installed, and I am running it on CPU.
The quantization seems to start but fails during the training phase with the error mentioned above.

Expected Behavior:

The quantization process should complete successfully without raising a RuntimeError.

fanz-xlnx commented 2 months ago

Hi @Siva50005, Thanks for your valuable feedback.

The quantization of Yolov8 needs QAT process to maintain the accuracy which is excluded from the tutorial. The QAT process is computationally demanding and time consuming even with GPU enabled. The runtime error is not expected on CPU only but It will cost the user several weeks to compute which is not a realistic use case from our point of view.

In my humble opinion, the issue could result from the virtual env like WSL. Do you have a real Linux env to run the quantizer? We have provided the docker image below to simplify the installation process for a quick validation. ryzen-ai-pytorch-docker

Siva50005 commented 2 months ago

Hi @Siva50005, Thanks for your valuable feedback.

The quantization of Yolov8 needs QAT process to maintain the accuracy which is excluded from the tutorial. The QAT process is computationally demanding and time consuming even with GPU enabled. The runtime error is not expected on CPU only but It will cost the user several weeks to compute which is not a realistic use case from our point of view.

In my humble opinion, the issue could result from the virtual env like WSL. Do you have a real Linux env to run the quantizer? We have provided the docker image below to simplify the installation process for a quick validation. ryzen-ai-pytorch-docker

Thank you for your prompt response. I agree that the QAT process can be computationally expensive, especially when maintaining accuracy. However, the runtime error I encountered was unexpected, and I appreciate your point regarding the use of virtual environments like WSL potentially being a factor.

I used the docker image which was provided in the pytorch quantization tutorial. docker image used: docker pull xilinx/vitis-ai-pytorch-cpu:latest

Currently, I am running the quantizer within a WSL environment on my system, and it’s possible that this could be contributing to the issue. I will try running the provided docker image directly on windows/linux environment. I’ll reach out if I run into any further issues. Thank you.

Siva50005 commented 2 months ago

Hi @Siva50005, Thanks for your valuable feedback.

The quantization of Yolov8 needs QAT process to maintain the accuracy which is excluded from the tutorial. The QAT process is computationally demanding and time consuming even with GPU enabled. The runtime error is not expected on CPU only but It will cost the user several weeks to compute which is not a realistic use case from our point of view.

In my humble opinion, the issue could result from the virtual env like WSL. Do you have a real Linux env to run the quantizer? We have provided the docker image below to simplify the installation process for a quick validation. ryzen-ai-pytorch-docker

I have been stuck at quantizing this yolov8 model for the past couple of days. If possible can you provide me some concrete steps to quantize it without QAT which is really time consuming? Possibly a PTQ would do

Thanks in advance

Best regards

fanz-xlnx commented 2 months ago

Hi @Siva50005, I hope the script below could help. https://github.com/amd/RyzenAI-SW/blob/1.1/tutorial/yolov8_e2e/run_ptq.sh

Siva50005 commented 2 months ago

Hi @Siva50005, I hope the script below could help. https://github.com/amd/RyzenAI-SW/blob/1.1/tutorial/yolov8_e2e/run_ptq.sh

Have tried this on the docker which was installed in the AMD ryzen machine. But i got some dependency issues which i was able to resolve but when i ran this command: bash run_ptq.sh, i got this error:

[Calib mode]
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.11) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Usage: yolo [OPTIONS] COMMAND [ARGS]...
Try 'yolo -h' for help.

Error: No such command 'detect'.
[Test mode] testing
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.11) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Usage: yolo [OPTIONS] COMMAND [ARGS]...
Try 'yolo -h' for help.

Error: No such command 'detect'.

Can you help me resolve this issue? This is urgent

fanz-xlnx commented 2 months ago

You may try to install the dependency below. https://github.com/amd/RyzenAI-SW/blob/1.1/tutorial/yolov8_e2e/env_setup.sh This is the dependency for ultralytics code of yolov8.

Siva50005 commented 2 months ago

You may try to install the dependency below. https://github.com/amd/RyzenAI-SW/blob/1.1/tutorial/yolov8_e2e/env_setup.sh This is the dependency for ultralytics code of yolov8.

I had installed them already before i ran the bash run_ptq.sh. But i still face the issue. This is the output I got when i ran yolo -h:

/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.11) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Usage: yolo [OPTIONS] COMMAND [ARGS]...

  Manage infrastructure and services on AWS for multiple accounts/stages.

  (Or, "yolo everything into prod".)

Options:
  -h, --help  Show this message and exit.

Commands:
  build-lambda           Build Lambda function packages.
  clear-config           Clear cached configuration for `yolo`.
  deploy-baseline-infra  DEPRECATED: Use `yolo deploy-infra` instead.
  deploy-infra           Deploy infrastructure from templates.
  deploy-lambda          Deploy Lambda functions for services.
  deploy-s3              Deploy a built S3 application.
  ensure-parameters      Ensure that all required parameters are defined...
  list-accounts          List AWS accounts.
  list-builds            List the pushed builds for a service/stage.
  list-lambda-builds     DEPRECATED: Use `yolo list-builds` instead.
  list-s3-builds         DEPRECATED: Use `yolo list-builds` instead.
  login                  Login with and cache Rackspace credentials.
  push                   Push a local build, ready it for deployment.
  push-lambda            DEPRECATED: Use `yolo push` instead.
  put-parameters         Securely store service/stage parameters.
  run                    Run a script with AWS account credentials.
  shell                  Launch a Python shell with AWS credentials.
  show-config            Show currently cached configuration.
  show-outputs           Show infrastructure stack outputs.
  show-parameters        Show centralized config for a service/stage.
  show-service           Show service configuration for a given stage.
  status                 Show infrastructure deployments status.
  upload-s3              DEPRECATED: Use `yolo push` instead.
  use-profile            Make Yolo use an AWS CLI named profile.

When I saw the packages installed using pip list, i couldnt find the ultralytics library. Should I install it manually? Or the issue is due to something else?

Siva50005 commented 2 months ago

You may try to install the dependency below. https://github.com/amd/RyzenAI-SW/blob/1.1/tutorial/yolov8_e2e/env_setup.sh This is the dependency for ultralytics code of yolov8.

Also I have a doubt that, In the readme file here: https://github.com/amd/RyzenAI-SW/tree/1.1/tutorial/yolov8_e2e It says that the supported CPUs are AMD Ryzen 7040U, 7040HS series mobile processors

But I am trying to do this on AMD Ryzen 9 7940HS. It wont be a problem right?

fanz-xlnx commented 2 months ago

You may try to install the dependency below. https://github.com/amd/RyzenAI-SW/blob/1.1/tutorial/yolov8_e2e/env_setup.sh This is the dependency for ultralytics code of yolov8.

Also I have a doubt that, In the readme file here: https://github.com/amd/RyzenAI-SW/tree/1.1/tutorial/yolov8_e2e It says that the supported CPUs are AMD Ryzen 7040U, 7040HS series mobile processors

But I am trying to do this on AMD Ryzen 9 7940HS. It wont be a problem right?

That's the Readme for previous release(v1.1). The quantization flow is removed from the latest release due to GPL license issue.

But in the latest release, if you managed to inference a FP32 Yolov8 model on CPU, the ptq script should work within the provided docker image.

Siva50005 commented 2 months ago

You may try to install the dependency below. https://github.com/amd/RyzenAI-SW/blob/1.1/tutorial/yolov8_e2e/env_setup.sh This is the dependency for ultralytics code of yolov8.

Also I have a doubt that, In the readme file here: https://github.com/amd/RyzenAI-SW/tree/1.1/tutorial/yolov8_e2e It says that the supported CPUs are AMD Ryzen 7040U, 7040HS series mobile processors But I am trying to do this on AMD Ryzen 9 7940HS. It wont be a problem right?

That's the Readme for previous release(v1.1). The quantization flow is removed from the latest release due to GPL license issue.

But in the latest release, if you managed to inference a FP32 Yolov8 model on CPU, the ptq script should work within the provided docker image.

I'm already trying to run the ptq script inside the docker image. But it still isn't working.

fanz-xlnx commented 2 months ago

You may try to install the dependency below. https://github.com/amd/RyzenAI-SW/blob/1.1/tutorial/yolov8_e2e/env_setup.sh This is the dependency for ultralytics code of yolov8.

Also I have a doubt that, In the readme file here: https://github.com/amd/RyzenAI-SW/tree/1.1/tutorial/yolov8_e2e It says that the supported CPUs are AMD Ryzen 7040U, 7040HS series mobile processors But I am trying to do this on AMD Ryzen 9 7940HS. It wont be a problem right?

That's the Readme for previous release(v1.1). The quantization flow is removed from the latest release due to GPL license issue. But in the latest release, if you managed to inference a FP32 Yolov8 model on CPU, the ptq script should work within the provided docker image.

I'm already trying to run the ptq script inside the docker image. But it still isn't working.

If you still face the same issue stated above, please refer to the link below to verify a float point yolov8 model first before get to the quantization stage. https://github.com/ultralytics/ultralytics

amd / RyzenAI-SW