VALI/src/TC/src/CudaUtils.cpp:32 CUDA error: CUDA_ERROR_NOT_INITIALIZED initialization error

tianyan01 commented 1 month ago

Hi, I use VALI in a torch training task. Here's my test code(need to modify the video's url):

import PyNvCodec as nvc
import numpy as np
import torch
import os
import random
from torch.utils.data.distributed import DistributedSampler
from torch.utils.data import DataLoader

def pynv_read_video(pyDec, frame_indice, gpu_id):
    # GPU-accelerated converter
    pyCvt = nvc.PySurfaceConverter(
        pyDec.Format(),
        nvc.PixelFormat.RGB,
        gpu_id=gpu_id
    )

    # Colorspace conversion context
    cc_ctx = nvc.ColorspaceConversionContext(
        pyDec.ColorSpace(),
        pyDec.ColorRange()
    )

    # GPU-accelerated Surface downloader
    pyDwn = nvc.PySurfaceDownloader(
        gpu_id=gpu_id
    )

    # Raw decoded Surface
    surf_src = nvc.Surface.Make(
        format=pyDec.Format(),
        width=pyDec.Width(),
        height=pyDec.Height(),
        gpu_id=gpu_id
    )

    # Raw Surface, converted to RGB
    surf_dst = nvc.Surface.Make(
        format=nvc.PixelFormat.RGB,
        width=pyDec.Width(),
        height=pyDec.Height(),
        gpu_id=gpu_id
    )

    # Numpy array which contains decoded RGB Surface
    frame = np.ndarray(
        dtype=np.uint8,
        shape=surf_dst.HostSize())

    video = []
    for idx in frame_indice:
        seek_ctx = nvc.SeekContext(seek_frame=idx)
        success, details = pyDec.DecodeSingleSurface(surf_src, seek_ctx=seek_ctx)

        # Convert tot RGB
        success, details = pyCvt.Run(surf_src, surf_dst, cc_ctx)

        # Copy pixels to numpy ndarray
        pyDwn.Run(surf_dst, frame)

        res_frame = np.reshape(
            frame,
            (pyDec.Height(), pyDec.Width(), 3))

        t = torch.Tensor(res_frame)
        video.append(t)
    video = torch.stack(video)
    return video

class MyDataset(torch.utils.data.Dataset):
    def __init__(self, path):
        """ init """
        self.rank = int(os.environ["LOCAL_RANK"])
        self.samples_num = 10
        self.path = path

    def read_video(self, num_frames):
        """ read video """
        # GPU-accelerated decoder
        pyDec = nvc.PyDecoder(
            self.path,
            {},
            self.rank,
        )
        total_frames = pyDec.NumFrames()
        # video_fps = pyDec.AvgFramerate()

        frame_indice = np.linspace(0, total_frames - 1, num_frames, dtype=int)
        video = pynv_read_video(pyDec, frame_indice, self.rank)
        return video

    def __getitem__(self, index):
        """ get item """
        try:
            video = self.read_video(10)
            return video
        except Exception as e:
            print(f"Error {e}")

    def __len__(self):
        """len"""
        return self.samples_num

def seed_worker(worker_id):
    """ seed worker """
    worker_seed = 1024
    np.random.seed(worker_seed)
    torch.manual_seed(worker_seed)
    random.seed(worker_seed)

url = "/path/to/test.mp4"
my_dataset = MyDataset(url)
sampler = DistributedSampler(
    my_dataset, 
    num_replicas=8, 
    rank=my_dataset.rank, 
    shuffle=False
)
my_dataloader = DataLoader(
    my_dataset,
    batch_size=1,
    sampler=sampler,
    worker_init_fn=seed_worker,
    drop_last=False,
    pin_memory=True,
    num_workers=8,
)
dataloader_iter = iter(my_dataloader)
video = next(dataloader_iter)
print(my_dataset.rank, video.shape)

It send me an error: VALI/src/TC/src/CudaUtils.cpp:32 CUDA error: CUDA_ERROR_NOT_INITIALIZED initialization error. How to fix it? Thanks!

RomanArzumanyan commented 1 month ago

Hi @tianyan01

Can you run samples or unit tests ?

tianyan01 commented 1 month ago

Hi @tianyan01

Can you run samples or unit tests ?

I can run unit test.But when I wrap it with torch's dataset and dataloader, it can't run.

RomanArzumanyan commented 1 month ago

@tianyan01

Most probably something is going on with the script, not the VALI itself. E. g. torch does something to CUDA runtime. The only thing I can recommend you is to simplify your app step by step until you are able to run it and isolate the culprit.

In order to find and fix the VALI bug I need an MVP, not he whole user script.

P. S. Please don't use PyDecoder.Seek method to shuffle the frames. Seek is costly operation. Just decode Surfaces one by one and shuffle your video list container.

tianyan01 commented 1 month ago

@RomanArzumanyan Thanks!

RomanArzumanyan / VALI

VALI/src/TC/src/CudaUtils.cpp:32 CUDA error: CUDA_ERROR_NOT_INITIALIZED initialization error #75