Running inference throws a CUDA exception

Hey,

cross reference to my issue on onnxruntime: https://github.com/microsoft/onnxruntime/issues/19076

Running my inference on the model at hand (superpoint_lightglue_end2end) I receive a CUDA memcpy exception. My CUDA version is 11.8 with cuDNN 8.9.

Verbose log: verbose_log.txt

According to the verbose log it always seems to happen at Kernel with idx 2478. Anyone experienced something similar? What could be the cause?

<unknown> 0x00007fffd9970935
<unknown> 0x00007fffd9a5d86a
<unknown> 0x00007fffd9b914cb
<unknown> 0x00007fffd9b91d61
<unknown> 0x00007fffd9cb9130
<unknown> 0x00007fffd9931a33
<unknown> 0x00007fffd9931f41
<unknown> 0x00007fffd9932ea8
<unknown> 0x00007fffd9b000d1
<unknown> 0x00007fffdb644459
<unknown> 0x00007fffdb6176fd
cudaMemcpyAsync 0x00007fffdb6696a5
onnxruntime::GPUDataTransfer::CopyTensorAsync(onnxruntime::Tensor const&, onnxruntime::Tensor&, onnxruntime::Stream&) const 0x00007fff9fd1b0dd
onnxruntime::IDataTransfer::CopyTensors(std::vector<onnxruntime::IDataTransfer::SrcDstPair, std::allocator<onnxruntime::IDataTransfer::SrcDstPair> > const&) const 0x00007ffff6dbbe63
onnxruntime::ProviderHostImpl::IDataTransfer__CopyTensors(onnxruntime::IDataTransfer const*, std::vector<onnxruntime::IDataTransfer::SrcDstPair, std::allocator<onnxruntime::IDataTransfer::SrcDstPair> > const&) 0x00007ffff66406a8
onnxruntime::IDataTransfer::CopyTensors(std::vector<onnxruntime::IDataTransfer::SrcDstPair, std::allocator<onnxruntime::IDataTransfer::SrcDstPair> > const&) const 0x00007fff9ff35bc7
onnxruntime::DataTransferManager::CopyTensors(std::vector<onnxruntime::IDataTransfer::SrcDstPair, std::allocator<onnxruntime::IDataTransfer::SrcDstPair> > const&) const 0x00007ffff6dbf95d
onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection*, bool, onnxruntime::Stream*) 0x00007ffff6e65802
onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollectionHolder&, bool, onnxruntime::Stream*) 0x00007ffff6e66e8b
onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, OrtRunOptions const&, onnxruntime::DeviceStreamCollectionHolder&, onnxruntime::logging::Logger const&) 0x00007ffff6e671f3
onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >*, std::vector<OrtDevice, std::allocator<OrtDevice> > const*) [clone .localalias] 0x00007ffff668ac8a
onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue const* const, 18446744073709551615ul>, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue*, 18446744073709551615ul>) 0x00007ffff668bab2
OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) 0x00007ffff6613fff
Ort::detail::SessionImpl::Run onnxruntime_cxx_inline.h:967
spear::ort::Inference::run Inference.h:314
main superpoint_lightglue_main.cpp:67
__libc_start_call_main 0x00007ffff5c29d90
__libc_start_main_impl 0x00007ffff5c29e40
_start 0x0000555555558d55

fabio-sim / LightGlue-ONNX

Running inference throws a CUDA exception #61