Open bernhardmgruber opened 4 months ago
The implementation takes about 32 lines.
I think this would be a violation of principle of least surprise to have some types get value initialized and some types get default initialized.
You are correct that there is a need for the functionality you describe, but it should be a different type. Design and implementation of such a type is underway. See https://github.com/NVIDIA/cccl/issues/1502
I think this would be a violation of principle of least surprise to have some types get value initialized and some types get default initialized.
I kind of disagree. thrust::device_vector<T>
s appearing in random locations should retain the same promises as they did before. The escape hatch I am describing is intended for weakening the invariants in a local scope, from the point of construction of the vector to the point of writing to it. It is of course the expert programmer's responsibility to ensure this and there is potential for misuse.
I imagine the use of this facility like this. Old behavior:
thrust::device_vector<T> v(100); // first write
// full guarantees are given here
thrust::fill(v.begin(). v.end(), 42);// second write (overwrites)
// full guarantees are given here
v.resize(10000); // ordinary value initialization
// full guarantees are given here
New behavior:
thrust::device_vector<T> v(100, thrust::i_know_what_I_am_doing::default_init); // just default init
// partial guarantees are given here, no reading of primitive types allowed
thrust::fill(v.begin(). v.end(), 42);// first write
// full guarantees are given here
v.resize(10000); // ordinary value initialization
// full guarantees are given here
This behavior is btw. consistent with std::array<T, N> a;
, T a[N]
, T t;
, new T[n]
. All of these operations construct non-trivially constructible class types but leave primitive types uninitialized. There is precedence.
This behavior is not equivalent to raw allocation using malloc
or cudaMalloc
(untyped and uninitialized), memory resources (same), or cuda::experimental::uninitialized_buffer<T>
etc.
I further considered whether initialization should be a property of the allocator, but this will lead exactly to the problem you described. Suddenly a vector behaves totally different on all of it's operations.
You are correct that there is a need for the functionality you describe, but it should be a different type. Design and implementation of such a type is underway. See #1502
I also disagree. I don't want to have yet another vector type that only differs in the kind of initialization performed during construction, but is otherwise identical to any other thrust vectors. There is no use to make this difference in the type system:
void f(thrust::device_vector<T>& vec) { ... }
void f(thrust::default_initdevice_vector<T>& vec) { ... }
In both functions, I want vec
to have the exact same behavior and offer the exact same guarantees.
AFAIU, the memory resources tracked in #1502 offer faciliities for uninitialized (and potentially untyped) memory, which is a different story and not the intent of my feature request.
This behavior is btw. consistent with std::array<T, N> a;, T a[N], T t;, new T[n]. All of these operations construct non-trivially constructible class types but leave primitive types uninitialized. There is precedence.
All of those are distinct types from std::vector
, and there is no precedent for the functionality you're describing with a std::vector
.
Therefore, this tells me the precedence is to create a new type distinct from thrust::device_vector
.
I am with @jrhemstad here. All examples above are static arrays where they follow the classical C initialization rules.
However, thrust::{device, host}_vector
are containers that by default value initialize their elements.
That said, we are currently in the process of working on cuda::vector
which will have the same issue. Currently we are leaning towards allowing a vector(size_t, uninitialized_t)
constructor that would not initialize the elements at all, which is slightly different than default construction
To add another example, boost::container::vector<T>
has exactly the feature I would like to have in thrust: https://www.boost.org/doc/libs/1_85_0/doc/html/container/extended_functionality.html#container.extended_functionality.default_initialialization
Currently we are leaning towards allowing a
vector(size_t, uninitialized_t)
constructor that would not initialize the elements at all, which is slightly different than default construction
This is quite dangerous to the inexperienced programmer, since you cannot, generically speaking, assign to elements of such a vector, but have to construct them first. I think that a tag to default initialize is the far safer option here, since it still omits initialization for primitive data types, but runs constructors when they exist.
I believe the important issue is that you want to be able to completely avoid launching any operation to initialize a memory range that you will be overwriting soon.
In my world those types where default initialization is actually a win are those types where assignment is also trivial, so the "danger" is not materialized
We had a big discussion about this matter today and here is my summary:
We agreed on the following:
thrust::device_vector
, but we can also live with a reduced API set and stick with thrust::device_vector
where necessary (e.g. as destination for a remove_if
followed by erase
)thrust::device_vector
, we would extend the constructor and resize function with an overload taking a tag type (e.g. default_initialized_t
)cudaMalloc
is undesirableAlternatives:
The proposed cuda::experimental::uninitialized_buffer<T>
was the most strongly endorsed alternative, mostly because of consistent behavior. The memory is always uninitialized independently of the element type. If T
has a non-trivial constructor, the constructor is not run and accessing elements without placment new results in UB. This is fine, users need to know what they are doing. thrust::device_vector
retains the guarantee to always be value initialized. No garbage values (default constructed ints or floats) exist. Both types, uninitialized_buffer<T>
and device_vector<T>
, maintain their behavior independently of their element type T
. Furthermore, both types upheld their invariants immideately after construction.
By contrast, thrust::device_vector<T> v(n, default_init_t)
would perform no initialization when T
is trivially default-constructible, and would compile and run a kernel if T
is not trivially default-constructible. Thus, the construction behavior is less visible in the type. Furthermore, the aforementioned constructur, when T
is trivially default-constructible, leaves the vector v
in a less usable state, since it cannot be read from (e.g., default constructed int
s contain garbage values). A second step has to be performed (i.e., assigning to each element), before the usual guarantees of thrust::device_vector
(any element can be fully read) are fully upheld. The two step initialization was generally perceived as unfulfilling. However, thrust::device_vector<T>
guarantees that a non-trivial constructor will always be run, thus preferring safety over consistent behavior/performance.
Rapid's udevice_vector<T>
was brought up for comparison, but determined to be safer, since it static_assert
s if T
is not trivially default-constructible, whereas cuda::experimental::uninitialized_buffer<T>
will silently omit construction. It's the programmer's responsibility to handle element construction themselves.
It was noted that if T
is a trivial type, cuda::experimental::uninitialized_buffer<T>
and thrust::device_vector<T> v(n, default_init_t)
behave the same, so the discussion shifted to what to with non-trivially-constructible T
s. Those types are less common, but examples would be complex<T>
or a 4x4 matrix type that initializes to an identity matrix. It was argued that these types are often also trivially destructible and trivially copy-constructible, so it would not matter if the constructor is run or not, since values can be overwritten with memcpy (as permitted by trivial copy-constructability) anyway.
One person argued that the compile time of thrust::device_vectors
is rather high, so we would want something simpler than thrust::device_vector<T>
for the unit tests anyway.
Another argued that having a dedicated type to thrust::device_vector
may require API duplication, since we would want to, after construction, use that type the same way and for the same things as thrust::device_vector
(e.g. within Catch2).
Since we found use cases for both, default initialized and uninitialized memory, maybe both features are justified. If we provide a tag for thrust::device_vector
for default initialization, we should also provide a tag for no initialization
Further design alternatives discussed:
thrust::device_vector
to perform only default-initialization instead of value-initialization. This has the drawback that we cannot choose differently at each construction or resize how to initialize, since the behavior is fixed to the type. Also, we are uncertain how well custom allocators are supported in Thrust's existing interfaces.cuda::vector<T>
with an appropriate memory resourcethrust::device_vector
from a cuda::experimental::uninitialized_buffer<T>
I was recently reminded about this by a WG21 reflector discussion, pointing to this paper https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1072r10.html#tag, concluding that "LEWG should explore tags for allocator-aware containers". It also lists Boost as setting precedence for such a non-standard extension.
Interestingly, it also lists thrust::detail::temporary_array<T>
as example of omitting value-initialization if is_trivially_copy_constructible<T>
allows. This leaves me wondering why we are not using thrust::detail::temporary_array<T>
EVERYWHERE?
Is this a duplicate?
Area
Thrust
Is your feature request related to a problem? Please describe.
Thrust vectors perform value-initialization, like
std::vector
. That is, non-trivially constructible class types will have their constuctors run and all primitive types are zero-initialized. This pessimizes use cases where we want to create vectors which are inteded to be overwritten entirely before any read occurs. While smart compilers can optimize unnecessary initialization away forstd::vector
, this is much more difficult for e.g.thrust::device_vector
, since it's constructor has to allocate device memory and run a kernel for initialization.Describe the solution you'd like
I would like to have a constructor overload on all thrust vector's that performs default initialization. That is, non-trivially constructible class types will have their constuctors run but all primitive data types are left uninitialized. Since thrust vectors are often used with primitive types, there is great potential to save unnecessary kernel runs and discarded writes for vectors used to provide storage for outputs from thrust algorithms.
Note that I am not proposing to skip initialization. Default constructed elements contained in thrust vector do have their lifetimes begun and assigning/writing to them is defined behavior. Only reading from default initialized and only primitive elements is undefined behavior (before C++26) / erroneous behavior (since C++26) though, so this facility requires certain care.
Describe alternatives you've considered
Calling raw
cudaMalloc
and wrapping the result in aunique_ptr
.Additional context
No response