Open mrakgr opened 5 months ago
As an aside if you (the one writing the placeholder text) want discriminated union types and pattern matching, do check out Spiral. No need to bother with std::variant
. What Spiral has is much better and compiles to Cuda C++ directly having full interop with its libraries.
Probably, by the time you guys get to this, I'll have implemented all the classes for the C++ backend manually, so I won't need this then, but I guess opening this issue is more of way of resolving my feelings regarding what I want. Right now, the new backend is a weird mix of C and C++. I'll try going full C++ and seeing where that gets me. I hope Cuda supports virtual functions.
@mrakgr CUDA does support virtual functions but under certain restrictions.
I have some experimental code that exposes more of <memory>
within libcu++ like std::unique_ptr
However, that currently does not include std::shared_ptr
and I am also highly skeptical of std::unique_ptr
being the right thing
The reason is that memory safety is not trivial across heterogenous boundaries and we really want to make sure that we get the design right. As an example neither std::shared_ptr
nor std::unique_ptr
take a allocator or memory resource that specifies where the memory is allocated. is it on host is it shared memory?
That may be appropriate for the standard which assumes homogeneous memory systems, but it is a bad default for our use case.
We are currently in the process of designing a cuda::vector
that addresses these problems and I believe once we are happy with the design we should be able to easily adopt it for all the smart pointers
I did them like this in Spiral. That having said, I haven't run into a use case for them apart from implementing a ref counting backend yet.
They are intended for single threaded use, and since Spiral compiles to Python on the host, where they're allocated isn't something that I needed to think about.
Is this a duplicate?
Area
libcu++
Is your feature request related to a problem? Please describe.
I asked around, but it I got a negative answer. My own search on this repo only showed
shared_ptr
being used in host code.Basically, my problem is that I am working on a reference counting Cuda backend for Spiral and I am changing my mind that the ref counting work should be done by the Spiral compiler itself. If I had a
shared_ptr
class in kernel code I could compile recursive union types and various other data types so they use them. Right now, it'd be very easy to break the ref counting passes using macros, whileshared_ptr
would mesh well with those.The intended purpose of this class would be specifically for data not being shared between threads. In other worse, for single threaded code.
One other motivation behind having this is to lower the compilation times taken by the Cuda compiler. Previously, I've created the NL Holdem game directly on the GPU and I suspect that making use of too many value types is making the compilation times increase exponentially.
Describe the solution you'd like
shared_ptr
in device code seems like a good solution.Describe alternatives you've considered
Currently I have my own ref counting pass in Spiral that was made for a C backend. Something like that would be the only choice in C and it makes sense there. It is also designed to play well with tail recursion. But so far, even though the Cuda compiler does support tail recursion, I had to rewrite the inner loop for the Leduc game into an imperative one as the tail recursive one kept stack overflowing, so that advantage doesn't matter there. Another issue with having an inbuilt ref counting pass is that the heap allocated types wouldn't be interoperable with C++ libraries. Again, this wouldn't matter with C as the language is too inexpressive to have libraries worth using, but C++ is different.
Additional context
No response