EasonXiao-888 / GrootVL

The official implementation of GrootVL: Tree Topology is All You Need in State Space Model
57 stars 2 forks source link

CUDA kernel launch failed: an illegal memory access was encountered #6

Open Sycamorers opened 2 months ago

Sycamorers commented 2 months ago

Hello,

Thanks for your amazing work!

I was trying to implement the tree_scanning for some other tasks. However, I was encountering some CUDA issues that I could not solve. In the paper, you mentioned that you used 8 GPUs for training all the work, but I'm currently using only one 4090 that has 24 GB memory (I've also tried with another machine that has two GPUs, one 3090 and one 4090, it didn't work either). I'm not sure if this module, by default, consumes a lot of memory (but intuitively, it should not). CUDA kernel launch failed: an illegal memory access was encountered RuntimeError: CUDA error: an illegal memory access was encountered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I was using cuda12.1 + torch 2.3 and I also tested cuda11.8 + torch 2.3, neither worked.

I would really appreciate it if you have any ideas about what might be wrong and provide some hints or solutions! Thanks in advance!

EasonXiao-888 commented 2 months ago

@Sycamorers Thanks for your interest !!! In general, this error indicates that the model or input data may not be placed on the same GPU. By the way, you can test whether the tree scanning algorithm setting is successful by referring to the code demo following:

import torch

from classification.models.grootv import GrootV

model = GrootV(
        num_classes=10,
        channels=80,
        depths=[2, 2, 9, 2],
        layer_scale=None,
        post_norm=False,
        mlp_ratio=4.0,
        with_cp=False,
        drop_path_rate=0.1,
        ).cuda()

x = torch.rand(8, 3, 64, 64).cuda()

x = model(x)
Sycamorers commented 2 months ago

@Sycamorers Thanks for your interest !!! In general, this error indicates that the model or input data may not be placed on the same GPU. By the way, you can test whether the tree scanning algorithm setting is successful by referring to the code demo following:

import torch

from classification.models.grootv import GrootV

model = GrootV(
        num_classes=10,
        channels=80,
        depths=[2, 2, 9, 2],
        layer_scale=None,
        post_norm=False,
        mlp_ratio=4.0,
        with_cp=False,
        drop_path_rate=0.1,
        ).cuda()

x = torch.rand(8, 3, 64, 64).cuda()

x = model(x)

Thanks a lot for your prompt response! This actually worked perfectly! I will probably need to go back to my pipeline to see what is wrong! I really appreciate your help!