cppfastio / fast_io

Freestanding fast input/output for C++20
MIT License
630 stars 52 forks source link

vector::assign is incorrectly gated on is_trivially_relocatable #655

Open Quuxplusone opened 6 months ago

Quuxplusone commented 6 months ago

I noticed that you have at least two places where a memcpy optimization is gated only by your is_trivially_relocatable_v trait, even though the operation being optimized is a copy construction, not a relocation. (That is, there's no destruction of the source object/range.) This is the case for vector(initializer_list) and vector::assign(InputIt, InputIt) at least.

These places should be guarded either by is_trivially_copyable_v (for full Standard-compliant-UB-resistance) or by is_trivially_copy_constructible_v (for good-enough-in-practice). However, it is wrong to guard them with is_trivially_relocatable_v. Consider: We want shared_ptr to be is_trivially_relocatable, right? But initializing a vector<shared_ptr<T>> from an initializer_list<shared_ptr<T>> definitely can't be done with memcpy; the copy constructor bumps the ref count, and memcpy doesn't.

So your code happens to work today, merely because your is_trivially_relocatable_v is defined in terms of std::is_trivially_copyable_v; but you're gating optimizations on it that make it impossible for you ever to upgrade to e.g. Clang's __is_trivially_relocatable(T) (which, some day, will be true for shared_ptr). You should gate those optimizations explicitly on is_trivially_copyable instead.

Also, FYI, this week there is a PR out to fix the behavior of Clang's __is_trivially_relocatable(T) on Windows and bring it into alignment with P1144. I encourage you to go review that PR, and comment on it if you approve of its direction! (Really. It needs your support.) The PR in question is https://github.com/llvm/llvm-project/pull/84621

trcrsired commented 6 months ago

old version's clang with __is_trivially_relocatable(T) does not work.

Unless standard adds the std::is_trivially_relocatable_v into the standard, we cannot fix it.

The containers are still ongoing process. There are a lot of things to do.

trcrsired commented 6 months ago

see next branch

Quuxplusone commented 6 months ago

@trcrsired: The next branch seems the same as main in this respect. I'm talking about for example here: https://github.com/cppfastio/fast_io/blob/next/include/fast_io_dsal/impl/vector.h#L1201-L1203 IIUC, this is saying "To construct a vector from an iter/sentinel pair, if the iterator is contiguous, and the iterator's value_type is the same as the vector's, and the value_type is trivially relocatable, then you can just memcpy the bytes." I admit I'm unsure of the stuff after the word then, but I think that's what you're doing. And that's wrong, because (if Clang actually implemented P1144 is_trivially_relocatable) you might have:

std::shared_ptr<int> source[] = {
  std::make_shared<int>(1),
  std::make_shared<int>(2),
  std::make_shared<int>(3),
};
static_assert(std::is_trivially_relocatable_v<std::shared_ptr<int>>); // and therefore...
static_assert(fast_io::freestanding::is_trivially_relocatable_v<std::shared_ptr<int>>);
auto v = fast_io::containers::vector<std::shared_ptr<int>>(source, source+3);
  // oops! I claim this will bitwise-copy from source

Does that make sense?

And regarding "old version's clang with is_trivially_relocatable(T) does not work" — I agree! The PR opened this week would fix Clang's builtin, though, so that it returned the proper answer with zero false positives. (It would still have false negatives, e.g. `is_trivially_relocatable(std::shared_ptr)would remain false for now; we need the other half of P1144, the[[clang::trivially_relocatable]]` half, to make that work.)

Btw, you can test this code with my P1144 compiler; P1144 provides a feature-test macro. You'd commit something like this to your headers:

template <typename T>
struct is_trivially_relocatable
{
#if defined(__clang__) && defined(__cpp_impl_trivially_relocatable)
        inline static constexpr bool value = __is_trivially_relocatable(T);
#else
        inline static constexpr bool value = ::std::is_trivially_copyable_v<T>;
#endif
};

And then you could plug the test case above into Godbolt and see that the optimization inside the constructor should be gated on trivial copyability, not relocatability.

    template <bool move = false, typename Iter, typename Sentinel>
    constexpr void constructor(Iter first, Sentinel last) noexcept(noexcept(this->emplace_back(*first)) || move)
    {
-       if constexpr (::fast_io::freestanding::is_trivially_relocatable_v<value_type> &&
+       if constexpr (::std::is_trivially_copyable_v<value_type> &&
                      ::std::same_as<::std::iter_value_t<Iter>, value_type> &&
                      ::std::contiguous_iterator<Iter>)
trcrsired commented 6 months ago

The constructors in the library need some cleanups. It is a completely mistake for too many duplications.

trcrsired commented 6 months ago

The vector constructors need completely rewrite. It wasn't me who wrote the code, sorry.

trcrsired commented 6 months ago

What about now? Is it fixed in next branch?

Quuxplusone commented 6 months ago

Almost. The bits I called out at first are probably fixed; but grepping for is_trivially_relocatable, I see a few more problems. I'm pretty sure you want this:

--- a/include/fast_io_dsal/impl/vector.h
+++ b/include/fast_io_dsal/impl/vector.h
@@ -531,7 +531,7 @@ public:
        }
        constexpr void clear() noexcept
        {
-               if constexpr (!::fast_io::freestanding::is_trivially_relocatable_v<value_type>)
+               if constexpr (!::fast_io::freestanding::is_trivially_destructible_v<value_type>)
                {
                        for (auto old_i{imp.begin_ptr}, old_e{imp.curr_ptr}; old_i != old_e; ++old_i)
                        {

Also, you could use is_trivially_relocatable_v to optimize erase and insert; but they're not optimized at all currently. ...er, maybe they are? I see code in move_backward_construct that is correctly gated on is_trivially_relocatable_v, but at least along that codepath,

                if (iter != currptr) [[likely]]
                {
                        ::fast_io::containers::details::move_backward_construct(iter, currptr, currptrp1);
                        iter->~value_type();
                }

seems to be doing a double-destroy of *iter. That is, I think

vector<std::shared_ptr<int>> v = { std::make_shared<int>(1), std::make_shared<int>(2), std::make_shared<int>(3) };
v.insert(v.begin()+1, std::make_shared<int>(4));

will accidentally destroy the element with value 2 and leave a bogus (dangling) shared_ptr at v[2].

trcrsired commented 6 months ago

Has it fixed? I have added into the test.

Quuxplusone commented 6 months ago

Has it fixed? I have added into the test.

Drive-by comment on the diffs in next branch since yesterday: array<T,N>::max_size() should always return N, just like size() does.

Your move_destroy_construct function could productively be renamed uninitialized_relocate(first, last, dfirst), and I believe both of its memcpy should be memmove to account for the overlap that you'd expect when you switch to using it inside vector::erase_common:

        constexpr pointer erase_common(pointer it) noexcept
        {
                auto lastele{imp.curr_ptr};
-               ::std::move(it + 1, lastele, it);
-               --lastele;
-               lastele->~value_type();
+               if constexpr (::fast_io::freestanding::is_trivially_relocatable_v<value_type>) {
+                       it->~value_type();
+                       ::fast_io::containers::details::move_destroy_construct(it + 1, lastele, it);
+                       --lastele;
+               } else {
+                       ::std::move(it + 1, lastele, it);
+                       --lastele;
+                       lastele->~value_type();
+               }
                imp.curr_ptr = lastele;
                return it;
        }

(note that move_destroy_construct really means uninitialized_relocate, and you might consider using it unconditionally here, if you're willing for e.g. erasing from a vector<tuple<int&>> to act like erasing from a list<tuple<int&>> instead of doing a bunch of assign-throughs. See P2959R0, P3055.

Your move_backward_construct function (which you invariably follow by an explicit destructor call, meaning that it's really doing part of a series of relocates) could productively be replaced with uninitialized_relocate_backward(first, last, dlast) with the explicit destructor call eliminated.

Gentle reminder that I'd really appreciate your :+1: on https://github.com/llvm/llvm-project/pull/84621 ; your support will make a difference.

trcrsired commented 6 months ago

I do not know whether no one is proposing an is_zero_default_initialization which allows the vector to skip initialization for types like int

trcrsired commented 6 months ago

Do you have any interests on helping me maintain the code? I think you have much more knowledge on how to correctly write containers than me.

Quuxplusone commented 6 months ago

proposing an is_zero_default_initialization

Yes, Giuseppe D'Angelo wrote P2782 "A type trait to detect if value initialization can be achieved by zero-filling" (January 2023) and it got a 7–8–6–2–0 vote of interest by EWG, but I don't know if there's been any movement since June 2023.

Do you have any interest in helping me maintain the code?

No, not in general. I don't mind giving comments here and there, but I don't have the time to take on any kind of long-term obligation.

Speaking of comments... it looks like you committed a change to rename your move_destroy_construct to uninitialized_relocate, but then somewhere along the line changed it further into uninitialized_move, which is wrong. ("move" would say that it move-constructs (leaving the source range intact), like the STL algorithm std::uninitialized_move; but what the function is actually doing is relocating (destroying the source range), like P1144's std::uninitialized_relocate.)

It seems like you need at least the following diff applied again:

diff --git a/include/fast_io_dsal/impl/freestanding.h b/include/fast_io_dsal/impl/freestanding.h
index 778028ce..d6ca55f7 100644
--- a/include/fast_io_dsal/impl/freestanding.h
+++ b/include/fast_io_dsal/impl/freestanding.h
@@ -43,7 +43,7 @@ constexpr Iter2 uninitialized_relocate(Iter1 first, Iter1 last, Iter2 dest) noex
                        if (!__builtin_is_constant_evaluated())
 #endif
                        {
-                               return reinterpret_cast<Iter2>(::fast_io::freestanding::nonoverlapped_bytes_copy(reinterpret_cast<::std::byte const *>(first), reinterpret_cast<::std::byte const *>(last), reinterpret_cast<::std::byte *>(dest)));
+                               return reinterpret_cast<Iter2>(::fast_io::freestanding::bytes_copy(reinterpret_cast<::std::byte const *>(first), reinterpret_cast<::std::byte const *>(last), reinterpret_cast<::std::byte *>(dest)));
                        }
                }
                // we do not allow move constructor to throw EH.
@@ -85,8 +85,8 @@ constexpr Iter2 uninitialized_move(Iter1 first, Iter1 last, Iter2 dest) noexcept
                using iter1valuetype = ::std::iter_value_t<Iter1>;
                using iter2valuetype = ::std::iter_value_t<Iter2>;
                if constexpr (::std::is_pointer_v<Iter1> && ::std::is_pointer_v<Iter2> &&
-                                         (::fast_io::freestanding::is_trivially_relocatable_v<iter1valuetype> &&
-                                          ::fast_io::freestanding::is_trivially_relocatable_v<iter2valuetype> &&
+                                         (::fast_io::freestanding::is_trivially_move_constructible_v<iter1valuetype> &&
+                                          ::fast_io::freestanding::is_trivially_move_constructible_v<iter2valuetype> &&
                                           (::std::same_as<iter1valuetype, iter2valuetype> ||
                                                ((::std::integral<iter1valuetype> || ::std::same_as<iter1valuetype, ::std::byte>) &&
                                                 (::std::integral<iter2valuetype> || ::std::same_as<iter2valuetype, ::std::byte>) &&
diff --git a/include/fast_io_dsal/impl/vector.h b/include/fast_io_dsal/impl/vector.h
index a5dd0841..fcf0f294 100644
--- a/include/fast_io_dsal/impl/vector.h
+++ b/include/fast_io_dsal/impl/vector.h
@@ -1026,19 +1026,20 @@ private:
                {
                        it->~value_type();
                }
-               ::fast_io::freestanding::uninitialized_move(it + 1, lastele, it);
+               ::fast_io::freestanding::uninitialized_relocate(it + 1, lastele, it);
                imp.curr_ptr = lastele;
                return it;
        }

        constexpr pointer erase_iters_common(pointer first, pointer last) noexcept
        {
+               if (first == last) return first;
                auto currptr{imp.curr_ptr};
                if constexpr (!::std::is_trivially_destructible_v<value_type>)
                {
                        ::std::destroy(first, last);
                }
-               imp.curr_ptr = ::fast_io::freestanding::uninitialized_move(last, currptr, first);
+               imp.curr_ptr = ::fast_io::freestanding::uninitialized_relocate(last, currptr, first);
                return first;
        }

Notice the consistent naming and logic here: First we create a "window" where there are no objects, by destroying the objects there; then we relocate the objects above the window down into the window (leaving only destroyed objects at the end of the range, which we then snip off). If we merely moved the objects above the window down into the window, we'd end up with a bunch of objects in a moved-from state which we'd still need to destroy before we could snip them off.

(In fact, erase_common could just call erase_iters_common(it, it+1)!)

trcrsired commented 6 months ago

You can join our discord if you have more suggestions. I am still confused on what uninitialized_relocate is supposed to do

Quuxplusone commented 6 months ago

I am still confused on what uninitialized_relocate is supposed to do

@trcrsired: Ah. See [uninitialized.relocate]/1. uninitialized_relocate simply relocates a range of objects from source to destination. Before the operation you have objects at source but no objects at dest; after the operation you have objects at dest but no objects at source. Folly's Captain Picard story is another good short intro to relocation (although beware that Folly uses the adjective IsRelocatable to mean specifically "is trivially relocatable, using memcpy").

trcrsired commented 6 months ago

I am still confused on what uninitialized_relocate is supposed to do

@trcrsired: Ah. See [uninitialized.relocate]/1. uninitialized_relocate simply relocates a range of objects from source to destination. Before the operation you have objects at source but no objects at dest; after the operation you have objects at dest but no objects at source. Folly's Captain Picard story is another good short intro to relocation (although beware that Folly uses the adjective IsRelocatable to mean specifically "is trivially relocatable, using memcpy").

Why are not these uninitialized functions constexpr? Is it a bug in the standard?

Quuxplusone commented 6 months ago

Why are not these uninitialized functions constexpr?

These functions (uninitialized_move etc.) were introduced in C++17. Prior to C++20, it was impossible to manually construct or destroy objects at constexpr time. And since C++20, I guess nobody's bothered to bring a proposal to make them constexpr. Presumably that's because everyone's satisfied with the status quo — nobody really needs them to work at constexpr time.

trcrsired commented 6 months ago

Why are not these uninitialized functions constexpr?

These functions (uninitialized_move etc.) were introduced in C++17. Prior to C++20, it was impossible to manually construct or destroy objects at constexpr time. And since C++20, I guess nobody's bothered to bring a proposal to make them constexpr. Presumably that's because everyone's satisfied with the status quo — nobody really needs them to work at constexpr time.

Okay, I'll review and reconsider how to implement these changes. I'm also focusing on fixing other issues, such as ensuring all trivially relocatable types share the same code path and making some adjustments to allocator interfaces.

The problem with C++ containers, like std::vector<T>, std::deque<T>, or any other, is that even though the allocation code is nearly identical, using types like std::vector<int>, std::vector<double>, or std::vector<std::vector<std::vector<int>>> duplicates the code path for allocation. This duplication arises because the allocator is allocator<T> instead of just allocator, which could not allocate bytes and allow the use of the same allocator for different types.

Furthermore, there are other issues, such as calling operator new instead of using malloc and throwing std::bad_alloc instead of terminating, which I consider a problematic approach.

In summary, the C++ standard containers' interface faces significant challenges, and I don't believe they can be meaningfully fixed.

The IO interface is even more problematic. C++20's format reintroduces the format string, which I believe is a historical mistake. std::format cannot prevent format string vulnerabilities.

trcrsired commented 4 months ago

Hi. i was busily working on my paper so i was not working on this. Now is it fixed?

Quuxplusone commented 4 months ago

Yep, I just looked at the current next (after #721, i.e. 98121f646b122706652924efac24fc661935fc78), and erase looks fine to me now. I didn't look at anything else this time. You should consider adding some regression test cases, if you haven't already; e.g. the one I sketched in https://github.com/cppfastio/fast_io/issues/655#issuecomment-1996241311 .

trcrsired commented 4 months ago

Yep, I just looked at the current next (after #721, i.e. 98121f6), and erase looks fine to me now. I didn't look at anything else this time. You should consider adding some regression test cases, if you haven't already; e.g. the one I sketched in #655 (comment) .

Also i added allocate_at_least and it correctly allocates 24 or 16 bytes on dlmalloc based allocator with malloc_usable_size. It should be more allocator friendly.

I will improve a lot on allocators in the future. I think there is a lot more i can do. Like making allocators mutex aware, grow and shrink, and providing a thread-local but fragmentation ignored memory pool, this should greatly improve allocation performance.

trcrsired commented 4 months ago

https://github.com/cppfastio/fast_io/blob/next/include/fast_io_core_impl/allocation/c_malloc.h#L118

Quuxplusone commented 3 months ago

I just took another close look at the code in the master branch, in order to update my blog post "Who uses P2786 and P1144 trivial relocation?" with a fast_io entry. As a result, I can confidently say that this code in master is wrong:

        constexpr void assign(size_type n, value_type const& value) noexcept
                requires(::fast_io::freestanding::is_trivially_relocatable_v<value_type>)
        {
                if (n > static_cast<std::size_t>(imp.end_ptr - imp.begin_ptr))
                        grow_to_size_impl(n);
                ::fast_io::freestanding::fill_n(imp.begin_ptr, n, value);
                imp.curr_ptr = imp.begin_ptr + n;
        }

This is fill_n (assign)'ing over n elements, and then dropping the tail elements on the floor without destroying them. That's fine if is_trivially_destructible_v<value_type>that is what you should gate on! It is absolutely a memory leak for types that aren't trivially destructible (including e.g. unique_ptr, which is trivially relocatable but not trivially destructible).

trcrsired commented 3 months ago

I just took another close look at the code in the master branch, in order to update my blog post "Who uses P2786 and P1144 trivial relocation?" with a fast_io entry. As a result, I can confidently say that this code in master is wrong:

        constexpr void assign(size_type n, value_type const& value) noexcept
                requires(::fast_io::freestanding::is_trivially_relocatable_v<value_type>)
        {
                if (n > static_cast<std::size_t>(imp.end_ptr - imp.begin_ptr))
                        grow_to_size_impl(n);
                ::fast_io::freestanding::fill_n(imp.begin_ptr, n, value);
                imp.curr_ptr = imp.begin_ptr + n;
        }

This is fill_n (assign)'ing over n elements, and then dropping the tail elements on the floor without destroying them. That's fine if is_trivially_destructible_v<value_type>that is what you should gate on! It is absolutely a memory leak for types that aren't trivially destructible (including e.g. unique_ptr, which is trivially relocatable but not trivially destructible).

Master branch hasn't been maintained for a very long time. That vector was mainly internally used for implementing filesystem recursive path. Not going to take too much of them. The code isn't correct yeah. i know. That code wasn't written by me either. We've basically given up maintaining the master branch for a very long time.

There is a lot of issues I got wrong in the master and that is why I started the next branch.

trcrsired commented 3 months ago

I have moved the implementation from next to main for just this vector. It still needed to be finished for the container support, and it is still an ongoing process. Please do not judge us for this unfinished implementation.

trcrsired commented 3 months ago

Whatever. I would follow what the standard says about semantics. I do not have a strong opinion.

trcrsired commented 3 months ago

In the next branch and the updated master, I just use allocate_at_least instead of reallocate. There is no reallocate any more.