NTU-MedAI / FOTF-CPI

1 stars 1 forks source link

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #1

Open 2776222856 opened 5 days ago

2776222856 commented 5 days ago

When I run the train.py file to train the Davis dataset, I set input_dim_drug in the config file to 212 as prompted by the author. But then a runtime error occurs: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

What's going on, please?

NTU-MedAI commented 5 days ago

您好!您的邮件我以收到,谢谢!

NTU-MedAI commented 3 days ago

We are glad that you are interested in our work. Regarding your issue, we have not been able to reproduce the error on multiple devices. Here are a few suggestions that we hope will be helpful to you:

1.Check for out-of-bounds data:

This error often occurs when there is an out-of-bounds value or illegal operation on tensors. For example, in classification problems, if the labels exceed the number of classes, this could trigger the error.

Make sure your dataset labels or input tensors are within the valid range.

2.Check tensor sizes and types:

Ensure that the tensors you’re passing to CUDA operations have the correct shapes, types, and sizes. For example, in a classification task, the labels should be LongTensor and match the shape of the predictions.

------------------ 原始邮件 ------------------ 发件人: "NTU-MedAI/FOTF-CPI" @.>; 发送时间: 2024年10月6日(星期天) 晚上11:56 @.>; @.***>; 主题: [NTU-MedAI/FOTF-CPI] RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. (Issue #1)

When I run the train.py file to train the Davis dataset, I set input_dim_drug in the config file to 212 as prompted by the author. But then a runtime error occurs: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

What's going on, please?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>