Open Quuxplusone opened 6 months ago
old version's clang with __is_trivially_relocatable(T) does not work.
Unless standard adds the std::is_trivially_relocatable_v
The containers are still ongoing process. There are a lot of things to do.
see next branch
@trcrsired: The next
branch seems the same as main
in this respect. I'm talking about for example here:
https://github.com/cppfastio/fast_io/blob/next/include/fast_io_dsal/impl/vector.h#L1201-L1203
IIUC, this is saying "To construct a vector
from an iter/sentinel pair, if the iterator is contiguous, and the iterator's value_type
is the same as the vector's, and the value_type
is trivially relocatable, then you can just memcpy the bytes." I admit I'm unsure of the stuff after the word then, but I think that's what you're doing.
And that's wrong, because (if Clang actually implemented P1144 is_trivially_relocatable
) you might have:
std::shared_ptr<int> source[] = {
std::make_shared<int>(1),
std::make_shared<int>(2),
std::make_shared<int>(3),
};
static_assert(std::is_trivially_relocatable_v<std::shared_ptr<int>>); // and therefore...
static_assert(fast_io::freestanding::is_trivially_relocatable_v<std::shared_ptr<int>>);
auto v = fast_io::containers::vector<std::shared_ptr<int>>(source, source+3);
// oops! I claim this will bitwise-copy from source
Does that make sense?
And regarding "old version's clang with is_trivially_relocatable(T) does not work" — I agree! The PR opened this week would fix Clang's builtin, though, so that it returned the proper answer with zero false positives. (It would still have false negatives, e.g. `is_trivially_relocatable(std::shared_ptrwould remain false for now; we need the other half of P1144, the
[[clang::trivially_relocatable]]` half, to make that work.)
Btw, you can test this code with my P1144 compiler; P1144 provides a feature-test macro. You'd commit something like this to your headers:
template <typename T>
struct is_trivially_relocatable
{
#if defined(__clang__) && defined(__cpp_impl_trivially_relocatable)
inline static constexpr bool value = __is_trivially_relocatable(T);
#else
inline static constexpr bool value = ::std::is_trivially_copyable_v<T>;
#endif
};
And then you could plug the test case above into Godbolt and see that the optimization inside the constructor should be gated on trivial copyability, not relocatability.
template <bool move = false, typename Iter, typename Sentinel>
constexpr void constructor(Iter first, Sentinel last) noexcept(noexcept(this->emplace_back(*first)) || move)
{
- if constexpr (::fast_io::freestanding::is_trivially_relocatable_v<value_type> &&
+ if constexpr (::std::is_trivially_copyable_v<value_type> &&
::std::same_as<::std::iter_value_t<Iter>, value_type> &&
::std::contiguous_iterator<Iter>)
The constructors in the library need some cleanups. It is a completely mistake for too many duplications.
The vector constructors need completely rewrite. It wasn't me who wrote the code, sorry.
What about now? Is it fixed in next branch?
Almost. The bits I called out at first are probably fixed; but grepping for is_trivially_relocatable
, I see a few more problems. I'm pretty sure you want this:
--- a/include/fast_io_dsal/impl/vector.h
+++ b/include/fast_io_dsal/impl/vector.h
@@ -531,7 +531,7 @@ public:
}
constexpr void clear() noexcept
{
- if constexpr (!::fast_io::freestanding::is_trivially_relocatable_v<value_type>)
+ if constexpr (!::fast_io::freestanding::is_trivially_destructible_v<value_type>)
{
for (auto old_i{imp.begin_ptr}, old_e{imp.curr_ptr}; old_i != old_e; ++old_i)
{
Also, you could use is_trivially_relocatable_v
to optimize erase
and insert
; but they're not optimized at all currently. ...er, maybe they are? I see code in move_backward_construct
that is correctly gated on is_trivially_relocatable_v
, but at least along that codepath,
if (iter != currptr) [[likely]]
{
::fast_io::containers::details::move_backward_construct(iter, currptr, currptrp1);
iter->~value_type();
}
seems to be doing a double-destroy of *iter
. That is, I think
vector<std::shared_ptr<int>> v = { std::make_shared<int>(1), std::make_shared<int>(2), std::make_shared<int>(3) };
v.insert(v.begin()+1, std::make_shared<int>(4));
will accidentally destroy the element with value 2
and leave a bogus (dangling) shared_ptr at v[2]
.
Has it fixed? I have added into the test.
Has it fixed? I have added into the test.
Drive-by comment on the diffs in next
branch since yesterday: array<T,N>::max_size()
should always return N
, just like size()
does.
Your move_destroy_construct
function could productively be renamed uninitialized_relocate(first, last, dfirst)
, and I believe both of its memcpy
should be memmove
to account for the overlap that you'd expect when you switch to using it inside vector::erase_common
:
constexpr pointer erase_common(pointer it) noexcept
{
auto lastele{imp.curr_ptr};
- ::std::move(it + 1, lastele, it);
- --lastele;
- lastele->~value_type();
+ if constexpr (::fast_io::freestanding::is_trivially_relocatable_v<value_type>) {
+ it->~value_type();
+ ::fast_io::containers::details::move_destroy_construct(it + 1, lastele, it);
+ --lastele;
+ } else {
+ ::std::move(it + 1, lastele, it);
+ --lastele;
+ lastele->~value_type();
+ }
imp.curr_ptr = lastele;
return it;
}
(note that move_destroy_construct
really means uninitialized_relocate
, and you might consider using it unconditionally here, if you're willing for e.g. erasing from a vector<tuple<int&>>
to act like erasing from a list<tuple<int&>>
instead of doing a bunch of assign-throughs. See P2959R0, P3055.
Your move_backward_construct
function (which you invariably follow by an explicit destructor call, meaning that it's really doing part of a series of relocates) could productively be replaced with uninitialized_relocate_backward(first, last, dlast)
with the explicit destructor call eliminated.
Gentle reminder that I'd really appreciate your :+1: on https://github.com/llvm/llvm-project/pull/84621 ; your support will make a difference.
I do not know whether no one is proposing an is_zero_default_initialization
Do you have any interests on helping me maintain the code? I think you have much more knowledge on how to correctly write containers than me.
proposing an is_zero_default_initialization
Yes, Giuseppe D'Angelo wrote P2782 "A type trait to detect if value initialization can be achieved by zero-filling" (January 2023) and it got a 7–8–6–2–0 vote of interest by EWG, but I don't know if there's been any movement since June 2023.
Do you have any interest in helping me maintain the code?
No, not in general. I don't mind giving comments here and there, but I don't have the time to take on any kind of long-term obligation.
Speaking of comments... it looks like you committed a change to rename your move_destroy_construct
to uninitialized_relocate
, but then somewhere along the line changed it further into uninitialized_move
, which is wrong. ("move
" would say that it move-constructs (leaving the source range intact), like the STL algorithm std::uninitialized_move
; but what the function is actually doing is relocating (destroying the source range), like P1144's std::uninitialized_relocate
.)
It seems like you need at least the following diff applied again:
diff --git a/include/fast_io_dsal/impl/freestanding.h b/include/fast_io_dsal/impl/freestanding.h
index 778028ce..d6ca55f7 100644
--- a/include/fast_io_dsal/impl/freestanding.h
+++ b/include/fast_io_dsal/impl/freestanding.h
@@ -43,7 +43,7 @@ constexpr Iter2 uninitialized_relocate(Iter1 first, Iter1 last, Iter2 dest) noex
if (!__builtin_is_constant_evaluated())
#endif
{
- return reinterpret_cast<Iter2>(::fast_io::freestanding::nonoverlapped_bytes_copy(reinterpret_cast<::std::byte const *>(first), reinterpret_cast<::std::byte const *>(last), reinterpret_cast<::std::byte *>(dest)));
+ return reinterpret_cast<Iter2>(::fast_io::freestanding::bytes_copy(reinterpret_cast<::std::byte const *>(first), reinterpret_cast<::std::byte const *>(last), reinterpret_cast<::std::byte *>(dest)));
}
}
// we do not allow move constructor to throw EH.
@@ -85,8 +85,8 @@ constexpr Iter2 uninitialized_move(Iter1 first, Iter1 last, Iter2 dest) noexcept
using iter1valuetype = ::std::iter_value_t<Iter1>;
using iter2valuetype = ::std::iter_value_t<Iter2>;
if constexpr (::std::is_pointer_v<Iter1> && ::std::is_pointer_v<Iter2> &&
- (::fast_io::freestanding::is_trivially_relocatable_v<iter1valuetype> &&
- ::fast_io::freestanding::is_trivially_relocatable_v<iter2valuetype> &&
+ (::fast_io::freestanding::is_trivially_move_constructible_v<iter1valuetype> &&
+ ::fast_io::freestanding::is_trivially_move_constructible_v<iter2valuetype> &&
(::std::same_as<iter1valuetype, iter2valuetype> ||
((::std::integral<iter1valuetype> || ::std::same_as<iter1valuetype, ::std::byte>) &&
(::std::integral<iter2valuetype> || ::std::same_as<iter2valuetype, ::std::byte>) &&
diff --git a/include/fast_io_dsal/impl/vector.h b/include/fast_io_dsal/impl/vector.h
index a5dd0841..fcf0f294 100644
--- a/include/fast_io_dsal/impl/vector.h
+++ b/include/fast_io_dsal/impl/vector.h
@@ -1026,19 +1026,20 @@ private:
{
it->~value_type();
}
- ::fast_io::freestanding::uninitialized_move(it + 1, lastele, it);
+ ::fast_io::freestanding::uninitialized_relocate(it + 1, lastele, it);
imp.curr_ptr = lastele;
return it;
}
constexpr pointer erase_iters_common(pointer first, pointer last) noexcept
{
+ if (first == last) return first;
auto currptr{imp.curr_ptr};
if constexpr (!::std::is_trivially_destructible_v<value_type>)
{
::std::destroy(first, last);
}
- imp.curr_ptr = ::fast_io::freestanding::uninitialized_move(last, currptr, first);
+ imp.curr_ptr = ::fast_io::freestanding::uninitialized_relocate(last, currptr, first);
return first;
}
Notice the consistent naming and logic here: First we create a "window" where there are no objects, by destroying the objects there; then we relocate the objects above the window down into the window (leaving only destroyed objects at the end of the range, which we then snip off). If we merely moved the objects above the window down into the window, we'd end up with a bunch of objects in a moved-from state which we'd still need to destroy before we could snip them off.
(In fact, erase_common
could just call erase_iters_common(it, it+1)
!)
You can join our discord if you have more suggestions. I am still confused on what uninitialized_relocate is supposed to do
I am still confused on what uninitialized_relocate is supposed to do
@trcrsired: Ah. See [uninitialized.relocate]/1. uninitialized_relocate
simply relocates a range of objects from source to destination. Before the operation you have objects at source but no objects at dest; after the operation you have objects at dest but no objects at source.
Folly's Captain Picard story is another good short intro to relocation (although beware that Folly uses the adjective IsRelocatable
to mean specifically "is trivially relocatable, using memcpy").
I am still confused on what uninitialized_relocate is supposed to do
@trcrsired: Ah. See [uninitialized.relocate]/1.
uninitialized_relocate
simply relocates a range of objects from source to destination. Before the operation you have objects at source but no objects at dest; after the operation you have objects at dest but no objects at source. Folly's Captain Picard story is another good short intro to relocation (although beware that Folly uses the adjectiveIsRelocatable
to mean specifically "is trivially relocatable, using memcpy").
Why are not these uninitialized functions constexpr? Is it a bug in the standard?
Why are not these uninitialized functions constexpr?
These functions (uninitialized_move
etc.) were introduced in C++17. Prior to C++20, it was impossible to manually construct or destroy objects at constexpr time. And since C++20, I guess nobody's bothered to bring a proposal to make them constexpr. Presumably that's because everyone's satisfied with the status quo — nobody really needs them to work at constexpr time.
Why are not these uninitialized functions constexpr?
These functions (
uninitialized_move
etc.) were introduced in C++17. Prior to C++20, it was impossible to manually construct or destroy objects at constexpr time. And since C++20, I guess nobody's bothered to bring a proposal to make them constexpr. Presumably that's because everyone's satisfied with the status quo — nobody really needs them to work at constexpr time.
Okay, I'll review and reconsider how to implement these changes. I'm also focusing on fixing other issues, such as ensuring all trivially relocatable types share the same code path and making some adjustments to allocator interfaces.
The problem with C++ containers, like std::vector<T>
, std::deque<T>
, or any other, is that even though the allocation code is nearly identical, using types like std::vector<int>
, std::vector<double>
, or std::vector<std::vector<std::vector<int>>>
duplicates the code path for allocation. This duplication arises because the allocator is allocator<T>
instead of just allocator
, which could not allocate bytes and allow the use of the same allocator for different types.
Furthermore, there are other issues, such as calling operator new
instead of using malloc
and throwing std::bad_alloc
instead of terminating, which I consider a problematic approach.
In summary, the C++ standard containers' interface faces significant challenges, and I don't believe they can be meaningfully fixed.
The IO interface is even more problematic. C++20's format
reintroduces the format string, which I believe is a historical mistake. std::format
cannot prevent format string vulnerabilities.
Hi. i was busily working on my paper so i was not working on this. Now is it fixed?
Yep, I just looked at the current next
(after #721, i.e. 98121f646b122706652924efac24fc661935fc78), and erase
looks fine to me now. I didn't look at anything else this time.
You should consider adding some regression test cases, if you haven't already; e.g. the one I sketched in https://github.com/cppfastio/fast_io/issues/655#issuecomment-1996241311 .
Yep, I just looked at the current
next
(after #721, i.e. 98121f6), anderase
looks fine to me now. I didn't look at anything else this time. You should consider adding some regression test cases, if you haven't already; e.g. the one I sketched in #655 (comment) .
Also i added allocate_at_least and it correctly allocates 24 or 16 bytes on dlmalloc based allocator with malloc_usable_size. It should be more allocator friendly.
I will improve a lot on allocators in the future. I think there is a lot more i can do. Like making allocators mutex aware, grow and shrink, and providing a thread-local but fragmentation ignored memory pool, this should greatly improve allocation performance.
I just took another close look at the code in the master
branch, in order to update my blog post "Who uses P2786 and P1144 trivial relocation?" with a fast_io
entry. As a result, I can confidently say that this code in master
is wrong:
constexpr void assign(size_type n, value_type const& value) noexcept
requires(::fast_io::freestanding::is_trivially_relocatable_v<value_type>)
{
if (n > static_cast<std::size_t>(imp.end_ptr - imp.begin_ptr))
grow_to_size_impl(n);
::fast_io::freestanding::fill_n(imp.begin_ptr, n, value);
imp.curr_ptr = imp.begin_ptr + n;
}
This is fill_n
(assign)'ing over n
elements, and then dropping the tail elements on the floor without destroying them. That's fine if is_trivially_destructible_v<value_type>
— that is what you should gate on! It is absolutely a memory leak for types that aren't trivially destructible (including e.g. unique_ptr
, which is trivially relocatable but not trivially destructible).
I just took another close look at the code in the
master
branch, in order to update my blog post "Who uses P2786 and P1144 trivial relocation?" with afast_io
entry. As a result, I can confidently say that this code inmaster
is wrong:constexpr void assign(size_type n, value_type const& value) noexcept requires(::fast_io::freestanding::is_trivially_relocatable_v<value_type>) { if (n > static_cast<std::size_t>(imp.end_ptr - imp.begin_ptr)) grow_to_size_impl(n); ::fast_io::freestanding::fill_n(imp.begin_ptr, n, value); imp.curr_ptr = imp.begin_ptr + n; }
This is
fill_n
(assign)'ing overn
elements, and then dropping the tail elements on the floor without destroying them. That's fine ifis_trivially_destructible_v<value_type>
— that is what you should gate on! It is absolutely a memory leak for types that aren't trivially destructible (including e.g.unique_ptr
, which is trivially relocatable but not trivially destructible).
Master branch hasn't been maintained for a very long time. That vector was mainly internally used for implementing filesystem recursive path. Not going to take too much of them. The code isn't correct yeah. i know. That code wasn't written by me either. We've basically given up maintaining the master branch for a very long time.
There is a lot of issues I got wrong in the master and that is why I started the next branch.
I have moved the implementation from next to main for just this vector. It still needed to be finished for the container support, and it is still an ongoing process. Please do not judge us for this unfinished implementation.
Whatever. I would follow what the standard says about semantics. I do not have a strong opinion.
In the next branch and the updated master, I just use allocate_at_least instead of reallocate. There is no reallocate any more.
I noticed that you have at least two places where a memcpy optimization is gated only by your
is_trivially_relocatable_v
trait, even though the operation being optimized is a copy construction, not a relocation. (That is, there's no destruction of the source object/range.) This is the case forvector(initializer_list)
andvector::assign(InputIt, InputIt)
at least.These places should be guarded either by
is_trivially_copyable_v
(for full Standard-compliant-UB-resistance) or byis_trivially_copy_constructible_v
(for good-enough-in-practice). However, it is wrong to guard them withis_trivially_relocatable_v
. Consider: We wantshared_ptr
to beis_trivially_relocatable
, right? But initializing avector<shared_ptr<T>>
from aninitializer_list<shared_ptr<T>>
definitely can't be done with memcpy; the copy constructor bumps the ref count, and memcpy doesn't.So your code happens to work today, merely because your
is_trivially_relocatable_v
is defined in terms ofstd::is_trivially_copyable_v
; but you're gating optimizations on it that make it impossible for you ever to upgrade to e.g. Clang's__is_trivially_relocatable(T)
(which, some day, will betrue
forshared_ptr
). You should gate those optimizations explicitly onis_trivially_copyable
instead.Also, FYI, this week there is a PR out to fix the behavior of Clang's
__is_trivially_relocatable(T)
on Windows and bring it into alignment with P1144. I encourage you to go review that PR, and comment on it if you approve of its direction! (Really. It needs your support.) The PR in question is https://github.com/llvm/llvm-project/pull/84621