Closed harrism closed 2 months ago
I recall discussing this with @ericniebler and the TL;DR is that an any_resource
is copyable only if the resource it holds is also copyable.
I believe attempting to invoke the copy of any_resource
for a non-copyable upstream will cause an exception to be thrown.
not quite. we want our containers to own their memory resources just as STL containers own their allocators. also, we want our containers to be movable and copyable, and so the memory resource must also be movable and copyable.
which brings us to the question of: what does it mean to copy a memory resource? it doesn't mean somehow trying to clone its state. from the type- and value-theoretic point of view, a memory resource doesn't have any observable state; to wit, the resource
concept doesn't have any accessors that let you inspect its state. from that perspective, memory resources are immutable objects. when you have immutable objects, you get referential transparency and value semantics for free.
this is a round-about way of saying that we can have copyable memory resources by simply ref-counting them. we should provide such a ref-counting resource wrapper to make this easy. that is what you would pass to the container's constructor.
but note, you don't always need to dynamically allocate and ref-count your memory resource. if you understand the lifetimes of the container and of the memory resource you'd like it to use, you can construct the container with a resource_ref
that points to the resource. referential transparency makes this use sound as well.
besides, our most commonly used memory resources are actually empty, trivially-copyable, "flyweight" objects that delegate to the CUDA runtime.
all of this presupposes that the memory resource is thread-safe. if it's not, you'll have to wrap it in something that makes it thread-safe, or else be damn sure that it's not getting accessed concurrently.
EDIT: this is making me think that we should require that allocate
and deallocate
are callable on const
-qualified memory resources. attn: @pciolkosz @miscco
@ericniebler I agree with everything you said, but want to clarify this point:
our most commonly used memory resources are actually empty, trivially-copyable, "flyweight" objects that delegate to the CUDA runtime.
This shouldn't be the use case we optimize for. The resource we want most people to use is a memory pool, which means it has state and won't be copyable.
we want our containers to own their memory resources just as STL containers own their allocators.
Who is "we", and why do they want this? Maybe I don't understand what "we" mean by "own". In RMM, our containers do not own their MRs. They hold a reference to their MRs.
Agree with @jrhemstad , in RAPIDS, our most commonly used memory resources are definitely not flyweight objects.
if the containers store the memory resource by reference, that's the only option you have. if you store it by value, then you can pass a value-ish memory resource, or a reference-ish resource, or a ref-counted resource. it gives the user more control. rmm would pass a resource_ref
when constructing a container to store the resource by reference.
So what actually does happen when one creates an any_resource
from an instance of a non-copyable, non-movable memory resource class?
I would like to reiterate what the goals for the new cuda APIs are
Point 1. means for any container we need to bind the lifetime of the underlying memory resource to the container. Otherwise user will frequently fail to account for it and will run into segfaults.
That is the reason we require cudax::uninitialized_buffer
and cudax::vector
to own their memory resources through any_resource
. At the same time we want to be able to copy vectors which again goes into the "easy to use by default".
That means that an any_resource
must be able to copy its underlying resource. There is simply no generic reason why that would be disallowed.
Point 2. Means that we give the user the ability to make the safe default as performant as possible. You can have a look at my tests for cudax::vector
. There I use a caching resource to avoid frequent small allocations on device.
Its not a especially large resource but copying it would be rather stupid. For that reason the resource is not passed by value, but through a resource_ref
. As we discussed two week ago, this allows the user to safely share a resource as the container would copy the resource_ref
which is free and not the resource
which might be expensive.
This gives us the best of both worlds, without artificially constraining us into a corner.
Perhaps I am not understanding something, but how does the requirement for copyable containers work with passing resource_ref
s to their constructors?
Suppose I make a vector
whose allocation I want to control. I know the lifetime, and that it doesn't need to be managed with anything fancy, so I use a resource_ref
:
{
auto resource = some_internally_stateful_resource();
auto vec = cuda::std::vector<...>(size, resource_ref{resource});
... do something with vec;
// ~vec runs, ~resource runs, fine
}
No suppose that in "do something", I make a call into a third party library passing vec
that, for whatever reason, takes a copy of vec
whose lifetime outlives the block. How do I arrange that this is safe to do, or if not safe that I (preferably) get a compile-time error?
Is the answer that any function that is going to copy the containers also has to take the memory resource, such that the lifetime requirements are explicit.
Or is it that, in this scenario, my third-party library should somehow reject vectors that do not have a ref-counted memory resource inside them as their allocator?
Or some third thing I haven't thought of?
No suppose that in "do something", I make a call into a third party library passing
vec
that, for whatever reason, takes a copy ofvec
whose lifetime outlives the block. How do I arrange that this is safe to do, or if not safe that I (preferably) get a compile-time error?
This is not safe to do and will result in a runtime error. What the user should have done is pass in the resource as is
{
auto vec = cuda::std::vector<...>(some_internally_stateful_resource(), size);
... do something with vec;
// ~vec runs, ~resource runs, fine
}
That way, the resource would at least still be alive. On the other hand if the user really wants to pass in a resource they need to do something more fancy, either move the construction out of the block, or and we should add that use a "shared" resource that is ref counted
I have opened a PR for such a simple shared resource https://github.com/NVIDIA/cccl/pull/2398
This is not safe to do and will result in a runtime error.
What kind of runtime error? use-after-free, or something concretely debuggable?
if the user really wants to pass in a resource they need to do something more fancy
I think often the user does not even know they wanted to do this. It requires knowing, potentially, details of everything third party API call they make, along with its transitive dependencies.
I agree that the ref-counted resource solves this problem, thanks! But I am concerned that unless the constructor of allocated objects always reaches for this version, that the API will have many sharp edges.
But I am concerned that unless the constructor of allocated objects always reaches for this version, that the API will have many sharp edges.
I agree with this comment about sharp edges. It will be easy to forget to create a shared resource, especially when taking some global resource and wrapping it with a resource adaptor and then passing that to functions.
However, I think the common case is going to be something like we have in RMM:
auto vec = cuda::std::vector<...>(get_current_device_shared_resource(), size);
Where get_current_device_shared_resource()
returns an instance of shared_resource
, the ref-counted type-erased resource wrapper.
(In fact, RAPIDS will want a wrapper for cuda::std::vector
that defaults to getting that resource.)
We are definitely tension field of security vs performance.
I really do not want every resource to container a shared_ptr that does ref counting. At the same time nothing here is new. If a user wants to share a resource they need to make sure the resource is alive for as long as it is used.
We give them one better tool to achieve that, but I do not believe there is a golden solution that will satisfy all needs
We are definitely tension field of security vs performance.
I don't know what this phrase means. :)
We are definitely tension field of security vs performance.
I don't know what this phrase means. :)
We need to balance security concerns with performance considerations. using refcounted resources everywhere is going to have a considerable performance impact, so I want to make sure that we only use them when needed
I am not recommending refcounted resources everywhere. However I think shared ownership is needed whenever adaptors are used or in central resource management. Especially when we get into C++/Python interop.
I believe we are all aligned on what we're trying to achieve and I believe the current design has everything needed to make everyone happy. TL;DR:
cudax::vector
and other data containers will store their memory resource by value as an any_resource
cudax::vector<T> buff{..., mr};
mr
is a resource_ref
, then the vector has no control over the lifetime of the mr
and it is up to the caller to ensure mr
outlives buff
. mr
is just an instance of a concrete resource (or an any_resource
), it will be copied and buff
will have exclusive ownership of its copy of the resource. mr
does not need to outlive buff
because buff
has its own copy. mr
is a cudax::shared_resource
, it will still be copied into the vectors any_resource
member, and that copy will simply increment the refcount of the shared_resource
. No user intervention is required to ensure mr
outlives buff
as buff
maintains shared ownership of the mr
. Good summary. Thank you!
One more thing: we need a way to create a shared_resource
from a non-copyable, non-movable resource. E.g. make_shared_resource<Resource>(ctor_args...)
.
A question. Often our functions take resource_ref
arguments, because they are lightweight. But in the following, a function takes a resource_ref
, adapts it, and constructs and returns a vector using the adapted MR:
cudax::vector<int> foo(..., resource_ref mr) {
...
auto prefetch_mr = make_shared_resource<rmm::prefetch_resource_adaptor>(mr, args...);
return cudax::vector<int>(size, prefetch_mr);
}
By using a shared_resource
for the prefetch adaptor, we ensure that the adaptor's lifetime exceeds the returned vector. However because the upstream for the adaptor is mr
, which is a ref, we have no guarantee that the upstream outlives the adaptor or the vector.
So does the recommendation now become to accept any_resource
or shared_resource
when objects allocated using that resource will outlive the function call? The guidance now starts to get complicated. Presumably passing any_resource
or shared_resource
by value is much more expensive than passing resource_ref
.
So does the recommendation now become to accept any_resource or shared_resource when objects allocated using that resource will outlive the function call?
This is a very good point, and I think you are correct. However, I don't think the pattern you described is representative of any libcudf examples, is it? I wouldn't expect we'd ever adapt an incoming memory resource on our own and return memory that was allocated with the adapted resource.
Presumably passing any_resource or shared_resource by value is much more expensive than passing resource_ref.
You may be pleasantly surprised. any_resource
is just a resource_ref
in disguise. It extends resource_ref
's vtable and adds additional functions for the copy ctor/dtor/etc.
The size of the object is no different, it just requires initializing a few extra function pointers in the vtable.
it just requires initializing a few extra function pointers in the vtable.
Plus atomically incrementing a reference counter if it's a shared resource.
However, I don't think the pattern you described is representative of any libcudf examples, is it?
No, but I feel like I've seen something like it in RAFT? I am probably wrong.
Prefetching specifically is something we're doing in libcudf now, but I don't think we adapt the MR in the middle of a function like that.
i think we have resolved this issue. i'm closing it "not planned". @harrism feel free to reopen if you still feel that changes are needed here.
I don't think
any_resource
should support copy construction or copy assignment.What does it mean to copy a memory resource? A memory resource may own outstanding allocations to a LOT of memory. In some cases (pool memory resources), it may have allocated nearly all of the memory on a device, or in the system. What does it mean to copy that pool memory resource?
What if the copy is used to deallocate. Presumably the implementation of the pool doesn't know how to update the free lists of the original copy.
What is the intended use of copying
any_resource
?CC @wence-