When running the following CUDA code (as reproducer) with chipStar, the program segfaulted as it exited:
Reproducer:
#include <cstdio>
#include <cuda_runtime.h>
class Test {
public:
float *output;
Test();
~Test();
};
// Constructor
Test::Test()
{
output = NULL;
cudaMalloc(&output, sizeof(float) * 100);
}
// Destructor
Test::~Test()
{
printf("destructor called\n");
cudaFree(output);
}
Test test = Test();
int main() {
printf("testing\n");
return 0;
}
gdb Backtrace:
After investigating, it seems like the destructor for the global variable test that contains the cudaFree call, is called after the backend is uninitialized (backend was 0x0 right before segfaulting). As the below trace from thapi shows, the __hipUnregisterFatBinary was executed before the hipFree:
CUDA was run and traced to see if it behaves the same way (even though it was not segfaulting), and it seems like the cuMemFree succeeded (while chipStar's failed). Then it was after the cuMemFree was executed successfully, the error CUDA_ERROR_DEINITIALIZED was generated in cuDevicePrimaryCtxRelease. Please see the below trace:
When running the following CUDA code (as reproducer) with chipStar, the program segfaulted as it exited:
Reproducer:
gdb Backtrace:
After investigating, it seems like the destructor for the global variable
test
that contains the cudaFree call, is called after the backend is uninitialized (backend was 0x0 right before segfaulting). As the below trace from thapi shows, the__hipUnregisterFatBinary
was executed before thehipFree
:CUDA was run and traced to see if it behaves the same way (even though it was not segfaulting), and it seems like the cuMemFree succeeded (while chipStar's failed). Then it was after the cuMemFree was executed successfully, the error
CUDA_ERROR_DEINITIALIZED
was generated incuDevicePrimaryCtxRelease
. Please see the below trace: