Quuxplusone / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
https://p1144.godbolt.org/z/jf67zx5hq
Other
0 stars 2 forks source link

`std::ranges::minmax` miscompiles given Microsoft-style `tuple<int&>` #33

Open Quuxplusone opened 2 months ago

Quuxplusone commented 2 months ago

@frederick-vs-ja recently reported (see https://github.com/llvm/llvm-project/pull/89652#discussion_r1575540420) that Microsoft STL's pair<int&, int&> is sneakily implemented so that it is_trivially_copyable, despite its not behaving value-semantically. This might be fine for Microsoft, whose STL doesn't gate any optimizations on trivial copyability (or at least I hope they don't) — but it would absolutely not be fine for libstdc++ or libc++ in their current highly-optimizing states.

Unfortunately, it is totally possible for the user-programmer to employ the same tricks as Microsoft, and come up with their own type which behaves just like pair<int&, int&> or tuple<int&> (or worse) and yet is_trivially_copyable. This breaks any optimizations gated on is_trivially_copyable. Library vendors will have to gate these optimizations on the finer-grained is_trivially_constructible/is_trivially_assignable etc., or else make a policy decision that "we don't care about stupid types" (but then those same vendors must ensure that none of their types are "stupid"). (Where "stupid" here means "types that advertise trivial copyability according to the core language but whose copy operations are not, Platonically, trivial.)

Right now there's a PR out from @philnik777 (https://github.com/llvm/llvm-project/pull/89652) which would introduce a Microsoft-style pair into libc++. As I said above, this would absolutely not be fine for libc++ today, because libc++ likes to gate value-semantic optimizations on is_trivially_copyable.

I filed the libstdc++ issue as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114817 , using libstdc++'s std::copy as my example. Here's a libc++ version, using std::ranges::minmax as my example.

In both cases, the reported bug is "easy to fix" by just strengthening the gate on the specific optimization to include something like is_trivially_assignable<T, T&>. But in both cases, I'm only reporting the lowest-hanging fruit. Library vendors who want to care about "stupid types" will need to audit all of their optimizations to make sure nothing is left gated on is_trivially_copyable alone.

https://godbolt.org/z/rqjYn96ch

// EvilTuple is tuple<int&>, but using MSVC's trick
// to make it trivially copyable.
template<class T>
struct Evil {
    T *p_;
    explicit Evil() = default;
    explicit Evil(T& i) : p_(&i) {}
    Evil(const Evil&) = default;
    Evil& operator=(const volatile Evil&) = delete;
    template<class = void> Evil& operator=(const Evil& rhs) noexcept {
        puts("Evil!");
        *p_ = *rhs.p_;
        return *this;
    }
    ~Evil() = default;
    friend auto operator<=>(const Evil& a, const Evil& b) {
        return *a.p_ <=> *b.p_;
    }
    friend auto operator==(const Evil& a, const Evil& b) {
        return *a.p_ == *b.p_;
    }
};

using T = int;
using R = Evil<T>;
static_assert(std::is_trivially_copyable_v<R>);
static_assert(std::is_trivially_destructible_v<R>);
static_assert(std::is_trivially_copy_constructible_v<R>);
static_assert(std::is_trivially_move_constructible_v<R>);
static_assert(not std::is_trivially_copy_assignable_v<R>);
static_assert(not std::is_trivially_move_assignable_v<R>);

int main() {
    T values[] = {3,1,4,1,5};
    R tuples[] = {
        R(values[0]),
        R(values[1]),
        R(values[2]),
        R(values[3]),
        R(values[4]),
    };
    auto [min, max] = std::ranges::minmax(tuples);
    printf("%d %d\n", *min.p_, *max.p_);
}
philnik777 commented 2 months ago

My patch doesn't make pairs with references in them trivially copyable (or at least shouldn't). I'm also struggling to see what the problem in the ranges::mismatch example is. Could you explain that a bit more?

Quuxplusone commented 2 months ago

My patch doesn't make pairs with references in them trivially copyable (or at least shouldn't).

Oh, good. I see now I misread the order of events in https://github.com/llvm/llvm-project/pull/89652 ; somehow I got the idea that https://github.com/llvm/llvm-project/pull/89652 was a new PR in response to @frederick-vs-ja's comment, rather than vice versa.

I'm also struggling to see what the problem in the [ranges::minmax] example is. Could you explain that a bit more?

Well, you see that libc++ gives a different answer (5,5) than libstdc++ (1,5), and also makes ranges::minmax mutate the values array, right? So the question is whether that divergence is conforming or not. I claim (basically as self-evident and intuitive) that the divergence isn't conforming. I suppose you argue that it is conforming? Then we'd have to find support for that in the Standard.

There is a named requirement Cpp17CopyAssignable that mandates value-semantic assignment; but [alg.min.max] doesn't say anything about T's having to be Cpp17CopyAssignable.

[alg.min.max] does say that T has to model indirectly_copyable_storable, which mandates value-semantic copy-construction, and also mandates that T be assignable_from; but assignable_from does not appear to mandate that the assignment be value-semantic, merely that it be syntactically valid.

That is, I'm claiming that Evil<int> models all the constraints of ranges::minmax's parameter type (i.e. Evil<int> models indirectly_copyable_storable), and so the test program is supposed to be well-defined, and so libc++ should give the same answer as libstdc++. Flipping the "is_trivially_copyable-ness" of Evil<int> from true to false or false to true shouldn't change the answer that libc++ gives... But, right now, it does change the answer. So that's a bug.

Where in that line of reasoning do we start diverging?

philnik777 commented 2 months ago

(Note: I think you mixed up libc++ and libstdc++, libc++ returns (1, 5)) We never try to assign Evil<int> (https://godbolt.org/z/fEr3891c4). Since it's not integral, but a forward_range it will use the minmax_element branch. Just like the standard may not state that your types has to have sensible assignments (I haven't checked), it doesn't state that the algorithm ever has to assign the objects.

Quuxplusone commented 2 months ago

Argh, drat. I did mix up the libc++ and libstdc++ sides there. And I see now the gate on ranges::minmax's assignment optimization: it checks is_integral_v<_ValueT> as well as is_trivially_copyable_v<_ValueT>. So ranges::minmax is OK.

So in that case, the next-and-only higher-hanging fruit I see is when libc++ is compiled on GCC, so that __is_trivially_relocatable(T) isn't available, in which case we'll use memcpy instead of move-construction when reallocating a vector<Evil2>. Behavior with GCC isn't as easy to demonstrate on Godbolt, but here's a quick-and-dirty approximation:

https://godbolt.org/z/6hP7raYfo

struct Evil {
    int i_;
    int *p_ = &i_;
    explicit Evil(int i) : i_(i) {}
    Evil(const volatile Evil&) = delete;
    template<class = void> Evil(const Evil& rhs) noexcept : i_(rhs.i_) {
        puts("Appropriately evil!");
    }
    ~Evil() = default;
};

static_assert(std::is_trivially_copyable_v<Evil>);
#if COMPILED_ON_GCC
 static_assert(std::__libcpp_is_trivially_relocatable<Evil>::value);
#endif

int main() {
    std::vector<Evil> v;
    v.reserve(2);
    v.emplace_back(1);
    v.emplace_back(2);
    v.emplace_back(3);
    printf("%d %d %d\n", *v[0].p_, *v[1].p_, *v[2].p_);
      // dereferences dangling pointers when compiled with GCC
}
philnik777 commented 2 months ago

Urgh, yeah. That's almost certainly a bug.

frederick-vs-ja commented 2 months ago

microsoft/STL#4663 seems related. I'm not sure whether the (unfortunate) use of is_trivially_copyable_v is fine.

Edit: the use of is_trivially_copyable_v there should be fine. Both branches should be semantically equivalent modulo unspecified aspects permitted by the standard, and perhaps we shouldn't complicate the determination just for some contrived types.