google-code-export / thrust

Automatically exported from code.google.com/p/thrust
Apache License 2.0
2 stars 2 forks source link

global device_vector of custom struct throws exception at system closedown #455

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I have the following code with a custom myStruct that defines a default 
constructor (lets say because theres a initialization ctor too), and a global 
device_vector of myStruct:
---------------------------------------------------------------

#include <thrust/device_vector.h>

struct myStruct
{
  float x;

  __host__ __device__
  myStruct()
  {
  }
};

thrust::device_vector<myStruct> vec;

int main()
{
  vec.resize(16);
}

---------------------------------------------------------------

At the end of main, thrust throws a system_error.
This does not happen if either vec is not defined global, but inside main or 
myStruct does not explicitely define a default constructor.

---
Platform: Win7, VS2010, Cuda 4.1 RC with Thrust, GTX580, compute_10, sm_20

Original issue reported on code.google.com by euphrat1...@gmail.com on 4 Feb 2012 at 11:06

GoogleCodeExporter commented 9 years ago
Pretty sure this is because device_vector's destructor either calls cudaFree or 
launches a kernel to delete myStruct after CUDART has already torn itself down. 
 Not sure what we can do here.  The workaround is to avoid device_vectors in 
the global scope.

Original comment by jaredhoberock on 4 Feb 2012 at 11:09

GoogleCodeExporter commented 9 years ago
O.k, I'll do so.
But what difference does it make (i.e. why does it work) when there is no 
explicit ctor?

Original comment by euphrat1...@gmail.com on 4 Feb 2012 at 11:51

GoogleCodeExporter commented 9 years ago
I haven't investigated. Can you find the origin of the exception?

Original comment by jaredhoberock on 4 Feb 2012 at 11:53

GoogleCodeExporter commented 9 years ago
IIRC, we don't bother calling destructors on a device_vector's elements if the 
type is POD.  I think we classify your type with a user constructor as non-POD, 
so we'd call its destructors (even though in this example they have no effect).

So in your case, we're probably trying to launch a kernel to call destructors 
after CUDART no longer exists.  This causes a cuda error which we notice 
probably when we call cudaFree, and then we throw the exception.

Original comment by jaredhoberock on 5 Feb 2012 at 12:01

GoogleCodeExporter commented 9 years ago
The exception occurs somewhere within the function doexit() which is called 
from crt0dat.c on system exit:

void __cdecl exit (
        int code
        )
{
        doexit(code, 0, 0); /* full term, kill process */
}

Original comment by euphrat1...@gmail.com on 5 Feb 2012 at 12:13

GoogleCodeExporter commented 9 years ago
Does the destruction of cudart free all allocated memory on the device? If so, 
thrust could just safely ignore all destructors at this point. If not, the user 
should be made aware of freeing his global stuff by hand (if its not possible 
to do it automatigically).

Original comment by euphrat1...@gmail.com on 5 Feb 2012 at 12:17

GoogleCodeExporter commented 9 years ago
Yes, cudart should free all resources when it goes out of scope.

I'm not sure what the right thing to do is.  Perhaps it's possible to ask 
cudart if it's dead, and then avoid calling into it, or launching kernels, but 
I'd dread having to guard every single interaction with cudart.  This one might 
be cudart's bug.

Original comment by jaredhoberock on 5 Feb 2012 at 12:21

GoogleCodeExporter commented 9 years ago
I see. If so, I have to point out again that I'm working on CUDA's 4.1 RC. The 
actual version 4.1 was released days ago and I haven't had the time to update 
yet. Maybe someone with the 4.1 version of cudart could try out the code posted 
above and see whether this issue still exists ...

Original comment by euphrat1...@gmail.com on 5 Feb 2012 at 12:26

GoogleCodeExporter commented 9 years ago
We've spoken to the compiler team regarding this issue, and they've 
acknowledged that this is a known limitation in nvcc.  We're going to work 
towards a general solution in the compiler.

Original comment by jaredhoberock on 6 Feb 2012 at 9:37