ericniebler / stl2

LaTeX and Markdown source for the Ranges TS/STL2 and associated proposals
88 stars 8 forks source link

Relaxing the Common<T, U> requirements for the relational concepts #34

Closed CaseyCarter closed 5 years ago

CaseyCarter commented 9 years ago

Some LEWG reviewers mentioned this, and it emerged again in the discussion of issue #17, so I thought I should write up something more detailed. I went a bit overboard, this may need to be an LEWG paper instead of an "issue":

Abstract

During LEWG review of N4382, several people mentioned that they are uncomfortable with the fact that some of the cross-type concepts - specifically EqualityComparable<T, U> and TotallyOrdered<T, U> - require the user to exhibit a common type for the two subject types whose semantics mirror those of the cross-type relationships. This objection is not unreasonable as the requirements are overconstrained, which I will demonstrate. Overconstraining is not necessarily a misfeature; redundancy in a system allows for error-checking. For example, given two types T and U that model such a cross-type concept we could define a wrapper types T' and U' with error-checking operations that assert the equivalence of results from both the underlying operation on T and U and the operation on the common type. However, not every user will want or need this error-checking capability, so many see the fact that the concepts are overconstrained as a violation of the "Don't pay for what you don't use" principle of C++. I present here a relaxation of those concepts that does not require a common type.

Overconstraint

I claim that the two concepts in question are overconstrained because the requirement that a common type exist is equivalent to the requirement that the two subject types have cross-type relational operators that respect the semantics of the type-specific relational operators. I will use EqualityComparable to make this argument concrete and hope that it is clear enough to the reader that the argument can be generalized to cover TotallyOrdered as well. To prove equivalence we must demonstrate that (a) given a common type we can construct cross-type relational operators with consistent semantics, and (b) given cross-type relational operators with consistent semantics we can construct a common type.

Let T and U be two models of EqualityComparable, and C be a common type of T & U that models EqualityComparable. We can define cross-type relational operators for T and U as:

auto operator == (const T& t, const U& u) {
  return C{t} == C{u};
}

auto operator == (const U& u, const T& t) {
  return C{u} == C{t};
}

auto operator != (const T& t, const U& u) {
  return C{t} != C{u};
}

auto operator != (const U& u, const T& t) {
  return C{u} != C{t};
}

Clearly these cross-type relational operators are by definition consistent with the semantics of the relational operators on C.

For the converse argument, let T and U again be two models of EqualityComparable with semantically consistent cross-type relational operators. (By "semantically consistent", I mean that the cross-type operators respect the equivalences established by the type-specific operators. I.e, it must be the case that (a) the cross-type operators are symmetric in T and U, and (b) t == u for values t of type T and u of type U if and only if t2 == u for all values t2 == t and t == u2 for all values u2 == u.) We can then define a type C that is a discriminated variant of T and U, with relational operators (ignoring the implied requirement for move and/or copy construction of T and U necessary to convert T and U to C):

bool operator == (const C &a, const C& b) {
  if (<a contains a T>) {
    if (<b contains a T>)
      return <a as a T> == <b as a T>;
    else
      return <a as a T> == <b as a U>;
  } else {
    if (<b contains a T>) {
      return <a as a U> == <b as a T>;
    else
      return <a as a U> == <b as a U>;
  }
}

bool operator != (const C& a, const C& b) {
  return !(a == b);
}

Clearly these relational operators for C respect the semantics of the operators on T and U, and the cross-type operators as well. (Those familiar with N4382 will recognize this pattern as being embodied by the common_iterator class template, which provides exactly such a constructive common type for an Iterator and a corresponding Sentinel type. Those familiar with the Library Fundamentals TS may notice that std::optional<T> is a special case of this pattern that constructs a common type for T and std::nullopt with a cross-type operator== that always returns false.)

Proposed Design

I propose the relaxed concepts WeaklyEqualityComparable<T, U> and WeaklyTotallyOrdered<T, U> that do not require Common<T, U>, but instead directly impose the necessary semantic requirements on the cross-type relational operators:

template <class T>
concept bool WeaklyEqualityComparable() {
  return EqualityComparable<T>();
}

template <class T, class U>
concept bool WeaklyEqualityComparable() {
  return WeaklyEqualityComparable<T>() &&
    (Same<T, U> ||
      (WeaklyEqualityComparable<U>() &&
       requires(T t, U u) {
         { t == u } -> Boolean;
         { u == t } -> Boolean;
         { t != u } -> Boolean;
         { u != t } -> Boolean;
       }));
}

Given objects t1 and t2 of type T and u1 and u2 of type U, types T and U model WeaklyEqualityComparable if and only if:

  • (bool(t1 == u1) == bool(u1 == t1)) != false [== is symmetric with respect to T and U]
  • (bool(u1 != t1) == bool(t1 != u1)) != false [!= is symmetric with respect to T and U]
  • (bool(t1 != u1) == !bool(t1 == u1)) != false [!= is the complement of ==]
  • ((bool(t1 == u1) && bool(t1 == t2)) == bool(t2 == u1)) != false [== of T and U respects T's ==]
  • ((bool(t1 == u1) && bool(u1 == u2)) == bool(t1 == u2)) != false [== of T and U respects U's ==]
template <class T>
concept bool WeaklyTotallyOrdered() {
  return TotallyOrdered<T>();
}

template <class T, class U>
concept bool WeaklyTotallyOrdered() {
  return WeaklyTotallyOrdered<T>() &&
    (Same<T, U> ||
      (WeaklyTotallyOrdered<U>() &&
       WeaklyEqualityCompable<T, U>() &&
       requires(T t, U u) {
         { t < u } -> Boolean;
         { t <= u } -> Boolean;
         { t > u } -> Boolean;
         { t >= u } -> Boolean;
         { u < t } -> Boolean;
         { u <= t } -> Boolean;
         { u > t } -> Boolean;
         { u >= t } -> Boolean;
       }));
}

Given objects t1 and t2 of type T and u1 and u2 of type U, types T and U model WeaklyTotallyOrdered if and only if:

  • (bool(t1 < u1) == bool(u1 > t1)) != false [Symmetry in T and U]
  • (bool(t1 > u1) == bool(u1 < t1)) != false
  • (bool(t1 >= u1) == bool(u1 <= t1)) != false
  • (bool(t1 <= u1) == bool(u1 >= t1)) != false
  • (bool(t1 >= u1) == !bool(t1 < u1)) != false [>= is the complement of <]
  • (bool(t1 <= u1) == (bool(t1 < u1) || bool(t1 == u1))) != false [<= is the union of < and ==]
  • ((bool(t1 < u1) && bool(u1 < u2)) == bool(t1 < u2)) != false [< respects U's <]
  • ((bool(t1 < u1) && bool(t2 < t1)) == bool(t2 < u1)) != false [< respects T's <]
  • Exactly one of bool(t1 < u1), bool(t1 == u1), or bool(t1 > u1) is true. [Totality]

I separately propose that the library provide a default_common_type class template:

template <class T, class U>
  requires WeaklyEqualityComparable<T, U>
class default_common_type {
public:
  default_common_type(T);
  default_common_type(U);

  friend bool operator == (const default_common_type& a,
                           const default_common_type& b);
  friend bool operator != (const default_common_type& a,
                           const default_common_type& b);
};

default_common_type<T, U> shall model:

  • WeaklyEqualityComparable
  • WeaklyTotallyOrdered if WeaklyTotallyOrdered<T, U>
  • Iterator if Sentinel<U, T>, with
    • Same<IteratorCategory<T>, IteratorCategory<default_common_type<T, U>>>
    • Same<ValueType<T>, ValueType<default_common_type<T, U>>
    • Same<DifferenceType<T>, DifferenceType<default_common_type<T, U>>

[Implementors would presumably use default_common_type and restrict the iterator category to implement the backwards compatible common_iterator.] I DO NOT PROPOSE that we define:

template <class T, class U>
  requires WeaklyEqualityComparable<T, U>
struct common_type<T, U> { type = default_common_type<T, U>; };

Since the symmetry requirement of Common would be impossible to satisfy; there is no canonical ordering for types. (This problem does not present for Sentinel<S, I> due to the asymmetry of Sentinel.) This issue does not prevent library users from using default_common_type to specialize common_type if desired after choosing an arbitrary ordering for the parameter types, e.g.,

struct A {};
struct B {};

// ...definitions of consistent relational operators for A and B...

namespace std {
template <>
struct common_type<A, B> { using type = default_common_type<A, B>; };

template <>
struct common_type<B, A> { using type = default_common_type<A, B>; };
}

or prevent clients that desire a common type for types T and U from instantiating default_common_type<T, U> if common_type<T, U> has no member type. [The library should possibly provide an alias template that evaluates to T if Same<T, U>, common_type_t<T, U> if it exists, and otherwise default_common_type<T, U>.]

Design options

(1) Ignore this document entirely: leave the Common requirements in place and force users to exhibit common types to satisfy the concepts. (2) Incorporate the relaxed concepts described herein. Relax the requirements on everything in the standard library (to a cursory inspection nothing in N4382 actually needs a common type.) (3) Same as (2), but prefix the names of the stronger concepts with Strong and remove the prefix Weak from the concepts defined herein. The weaker concepts will be used much more, and should therefore have the shorter names. (4) Same as (3), but remove the stronger concepts altogether from the library. If nothing in the library requires the stronger concepts, it should not define them. Users that want the stronger concept can easily define:

template<class T, class U>
concept bool StrongEqualityComparable() {
  return EqualityComparable<T, U> && Common<T, U> &&
    EqualityComparable<CommonType<T, U>>();
}
asutton commented 8 years ago

I tought that STL2 was intended to be compatible with existing code modulo bugs, and migration should be failry trivial.

I'm not sure that was ever the intent. This redesign allows us to design the syntactic requirements and their accompanying semantics, and tighten those interfaces in the process. The expectation has long been that this will break existing code. And in a TS, that's reasonable.

In other words, I would like to see an convincing explanation why the code examples I have presented are broken (bugged, unsound) and need to be rewritten, that I and every other programmer could present to their colleagues to explain why we need to rewrite it during migration of our code to STL2.

The argument needs to work the other way around. Ranges is developed with a precise semantic model for concepts. If want syntactic requirements to be relaxed, then you need to justify, or better yet prove, that your existing code is not outside of the semantics in this design.

Andrew

tomaszkam commented 8 years ago

The argument needs to work the other way around. Ranges is developed with a precise semantic model for concepts. If want syntactic requirements to be relaxed, then you need to justify, or better yet prove, that your existing code is not outside of the semantics in this design.

I think that the allowed semantics model for hetergenous relation, is that I can work on hetergeneous types T and U, if I have projection f1: T -> C and f2: U -> C and releation r strict weak order on C.

Currently such desing is supported by passing comparators representing relation r and separately two unarry functions representing projections f1 and f2. However in same situations I can implement exactly the same semantics by creating comprator working on T and U that combines implementation of projection and relation to achieve better performance.

In both cases I represent operation with same semantics, but using different syntax. So my question still is valid: why existing implementation model (cross type-comparator combining projection and relation) should no longer by supported?

asutton commented 8 years ago

I think that the allowed semantics model for hetergenous relation, is that I can work on hetergeneous types T and U, if I have projection f1: T -> C and f2: T -> C and releation 'r' strict weak order on C.

The semantics for all of the cross-type concepts are defined in terms of the common type (by way of value-preserving conversion to C).

These semantics were formulated by Alex Stepanov during the Palo Alto meeting that produced N3351. We did because wanted to be absolutely precise about the meaning of every expression used in STL algorithms, in large part to facilitate rigorous proofs of termination, correctness, performance, safety, etc.

"You know what I mean" doesn't really cut it in that environment.

Currently such desing is supported by passing comparators representing relation r and separately two unarry functions representing projections f1 and f2. However in same situations I can implement exactly the same semantics by creating comprator working on T and U that combiens implementation of projection and relation to achieve better performance.

Simply claiming that two definitions have "exactly the same semantics" is not really a good proof. What I want is a proof of why one composition of projections and comparators actually have the same observable effects as another. If you really want those requirements relaxed, then we need a good story for why they can be.

FWIW, I mentioned that I would be in favor of removing the common type requirements in a message yesterday.

Andrew

tomaszkam commented 8 years ago

Also, there is a pair of types in the existing standard: error_code and error_condition that currently can be compared (provide heterogeneous operator==) but does not fulfill EqualityComparable concept.

This is not only caused by presence lack of common_type specialization existing in the program. The relation between the error_code and error_condition is an Is-A relation, and error_code e comparison with error_condition ec returns true, if one given e fails under ec. So basically an CommonType for this two classes is an error_condition, but projection from error_code to error_condition is not an bijection.

This could be compared to checking if dog is an mammal via use of equality between object representing species and clade.

ericniebler commented 8 years ago

The example @tomaszkam presents above with Date, Leg, and Itinerary is compelling. Making this example work in the current model would require:

template <InputRange A, InputRange B>
  requires CommonReference<reference_t<iterator_t<A>>, reference_t<iterator_t<B>>>()
struct common_type<A, B> {
  using type = any_view<common_reference_t<reference_t<iterator_t<A>>, reference_t<iterator_t<B>>>>;
};
template <InputRange A>
struct common_type<A, A> { using type = A; };

... and

With all of these pieces, the solution would be:

ranges::equal_range( itineraries, dates, less<>{},
    [](auto& i) { return i.legs() | view::transform(&Leg::departure_date); } );

All of the things I mention above would be nice to have, and they'll all probably be proposed eventually, but they are not in the Ranges TS as it currently stands, and even if they were I would like a better migration story than that.

tomaszkam commented 8 years ago

All of the things I mention above would be nice to have, and they'll all probably be proposed eventually, but they are not in the Ranges TS as it currently stands, and even if they were I would like a better migration story than that.

I suspect that the cost of the type-erasure (indirect call and probably allocation) will still be an show-stopper for migrating from current heterogeneous comparator solution to STL2 version, even if all of above components would be present.

ericniebler commented 8 years ago

I suspect that the cost of the type-erasure (indirect call and probably allocation) will still be an show-stopper for migrating from current heterogeneous comparator solution to STL2 version, even if all of above components would be present.

You're confused about how the common type is actually used in the current design. The system merely checks that such a common type exists and that it satisfies certain constraints. It doesn't actually use the common type beyond that. The solution I presented should be optimally efficient.

tomaszkam commented 8 years ago

You're confused about how the common type is actually used in the current design.

Ok, I actually misread your solution and assumed that:

Global overloads of the relational operators that lexicographically compare ranges of ordered things

would actually accepts any_view i.e. common type of the ranges.

tomaszkam commented 8 years ago

The solution I presented should be optimally efficient.

I was hardly believing that generating an view in-place could be optimized out by the complier to the same code, as direct comparator and decided to perform some measurements.

So I have created a small benchmark code, that defines enough classes to make your solution compilable date_bench.txt and measured the result.

For g++ 6.2.0 with -O3 your solution is 2 times slover.

Benchmark                              Time           CPU Iterations
--------------------------------------------------------------------
ItineraryFixture/STL1/1024/1          27 ns         27 ns   25552068
ItineraryFixture/STL1/4k/1            34 ns         34 ns   20405213
ItineraryFixture/STL1/32k/1           47 ns         47 ns   14457145
ItineraryFixture/STL1/256k/1          64 ns         64 ns   10724019
ItineraryFixture/STL1/1024k/1         77 ns         77 ns    8817634
ItineraryFixture/STL1/1024/2          25 ns         25 ns   28038427
ItineraryFixture/STL1/4k/2            31 ns         31 ns   22619194
ItineraryFixture/STL1/32k/2           44 ns         44 ns   16135515
ItineraryFixture/STL1/256k/2          57 ns         57 ns   11974023
ItineraryFixture/STL1/1024k/2         75 ns         75 ns    9224108
ItineraryFixture/STL2/1024/1          49 ns         49 ns   14157039
ItineraryFixture/STL2/4k/1            57 ns         57 ns   12363314
ItineraryFixture/STL2/32k/1           78 ns         78 ns    8301573
ItineraryFixture/STL2/256k/1         104 ns        104 ns    6664611
ItineraryFixture/STL2/1024k/1        126 ns        126 ns    5544926
ItineraryFixture/STL2/1024/2          54 ns         54 ns   12180711
ItineraryFixture/STL2/4k/2            72 ns         72 ns    9747498
ItineraryFixture/STL2/32k/2          100 ns        100 ns    6808511
ItineraryFixture/STL2/256k/2         137 ns        137 ns    5227348
ItineraryFixture/STL2/1024k/2        176 ns        176 ns    4019115

For clang 3.8.1 -O3 the proposed solution is nearly 4x slower:

Benchmark                              Time           CPU Iterations
--------------------------------------------------------------------
ItineraryFixture/STL1/1024/1          28 ns         28 ns   24910194
ItineraryFixture/STL1/4k/1            34 ns         34 ns   20318654
ItineraryFixture/STL1/32k/1           52 ns         52 ns   10000000
ItineraryFixture/STL1/256k/1          68 ns         68 ns   10940773
ItineraryFixture/STL1/1024k/1         84 ns         84 ns    8446369
ItineraryFixture/STL1/1024/2          25 ns         25 ns   27510024
ItineraryFixture/STL1/4k/2            31 ns         31 ns   22658891
ItineraryFixture/STL1/32k/2           43 ns         43 ns   16022235
ItineraryFixture/STL1/256k/2          56 ns         56 ns   12378238
ItineraryFixture/STL1/1024k/2         74 ns         74 ns    9474417
ItineraryFixture/STL2/1024/1         129 ns        129 ns    5471593
ItineraryFixture/STL2/4k/1           173 ns        173 ns    4167787
ItineraryFixture/STL2/32k/1          219 ns        219 ns    2791664
ItineraryFixture/STL2/256k/1         314 ns        314 ns    2179594
ItineraryFixture/STL2/1024k/1        398 ns        398 ns    1913080
ItineraryFixture/STL2/1024/2         117 ns        117 ns    5654135
ItineraryFixture/STL2/4k/2           147 ns        147 ns    4751717
ItineraryFixture/STL2/32k/2          212 ns        212 ns    3397243
ItineraryFixture/STL2/256k/2         286 ns        286 ns    2490511
ItineraryFixture/STL2/1024k/2        332 ns        332 ns    2119912

Above overhead still seems to be unacceptable, especially when the original code is not wrong, just does not fulfills some additional, somehow arbitrary syntactic constrain, that disallows merging projection and comparison into single functor, while preserving semantic.

tomaszkam commented 6 years ago

As the Ranges TS are proposed to be merged to C++2a, would it be good to revisit this issue? Currently, this will make the code like Itin/Leg/Date example unrepresentable without major overhead, as we do not get the views parts.

ericniebler commented 6 years ago

I agree we should give this another hard think for C++20. I would like to find a way to support this use case, ideally without weakening the algorithm requirements to the point of making them meaningless.

tomaszkam commented 6 years ago

Quoting your answer for not changing the result of count_if to return also reached end iterator:

You're right, it's arbitrary, and it makes me uncomfortable. But making people do more work to get the result they asked for also makes me uncomfortable. We don't have the luxury of designing STL2 in a bubble ... there is a huge installed base, and ease of migration is a concern.

As the migration is a concern, I would like to see that the code: equal_range(itins.begin(), itins.end(), dates, ItinDateLess()) Would work with STL2, without the need to rewrite every existing comparator, that incorporates projection (or in other worlds key_function), to be rewritten to set of separate projection and comparator.

Note, that this was the only approach to represent such comparison in STL1, and it does not become less sound, just because we get a new way to express the same thing.

ericniebler commented 6 years ago

Ease of migration is a concern, not the concern. Soundness is of paramount importance. STL1's algorithms are not sound. Hence the redesign. I'll only go so far to support this, but it will probably require effort to port code like this to STL2.

tomaszkam commented 6 years ago

Just one question. Lets make following observations: 1) lexicographical_compare is an functor, and can be used as Relation on ranges. 2) std::vector<int> and std::array<int, 5> are ranges, but they do not have common type.

That means that the following:

std::vector<std::vector<int>> vvi;
std::vector<std::array<int, 5>> vai;

ranges::lexicographical_compare(vvi, vai, ranges::lexicographical_compare);

Does not and should not compile?

tomaszkam commented 6 years ago

I am preparing a paper, that proposes to remove CommonReference requirement for StrictWeakOrdering for San Diego. The link for the most recent draft can be found here: http://htmlpreview.github.io/?https://github.com/tomaszkam/proposals/blob/master/Fixing%20Relations.html

ericniebler commented 6 years ago

Just one question. Lets make following observations:

lexicographical_compare is an functor, and can be used as Relation on ranges. std::vector<int> and std::array<int, 5> are ranges, but they do not have common type. That means that the following:

std::vector<std::vector<int>> vvi; std::vector<std::array<int, 5>> vai;

ranges::lexicographical_compare(vvi, vai, ranges::lexicographical_compare);

Does not and should not compile?

std::vector<int> and std::array<int, 5> do have a common type, but the compiler can't guess it. We need to help with a common_type<Range, Range> specialization. In addition, we should have relational operators that operate on ranges, deferring to ranges::equal and ranges::lexicographical_compare for their implementation. One those are added, ranges::lexicographical_compare(vvi, vai, ranges::lexicographical_compare); indeed will compile.

Also, I spent some time today investigating your claim that your itinerary example is slower in STL2. In fact, it is not. Please check my work at CaseyCarter/cmcstl2#181, but I think you'll see that even with the CommonReference syntactic requirement, your example is just as fast in STL2 as it is in STL1. And the code is shorter and easier to read, since there is no reason to write the asymmetric predicates.

tomaszkam commented 6 years ago

Also, this is only one of the argument provided for removing CommonReference for relations, other include: 1) It is superfluous - there is no need that equivalent* object need to have common type. 2) It limits generality - we cannot implement mathematical less due the requirement 3) Tailored for algorithms - does not work with ordered containers and heterogeneous lookups

Note that I am proposing only to remove this requirement for weak orderings, as the objects then are equivalent - this mathematical notation does not require that they have same type. In contrast strong ordering (< and ==) requires that a == b, means that the object are equal ("same") - if the object represent same entity, there should be no problem with pointing out common_type for them.

ericniebler commented 6 years ago

there is no need that equivalent* object need to have common type

A strict weak ordering is irreflexive, asymmetric, and transitive. In addition, it defines a strict total order over its equivalence classes. Let's just take the first: what does "irreflexive" mean when there is no common domain into which values of different types can be projected?

we cannot implement mathematical less due the requirement

Here you are referring to the MathematicalLess predicate from your paper, which runs afoul of C's unfortunate integer promotion rules. It is not any problem with the mathematical definition of strict weak orderings or less as it relates to CommonReference. Regardless, I agree this is an interesting problem that needs to be looked at.

does not work with ordered containers and heterogeneous lookups

I would need to see proof of that claim. I currently have no reason to believe that, say, using ranges::less<> as the ordering for a std::map<std::string, int>, and then doing a search with std::string_view would cause any problems. The compiler in my head (not infallible!) says it should work just fine.

tomaszkam commented 6 years ago

Let's just take the first: what does "irreflexive" mean when there is no common domain into which values of different types can be projected?

Irreflexible means that for any element x for the domain D, the r(x, x) is false. If your domain is the sum of object of type Employee, and object of type std::string, and we have comparator f, then for each ebeing Employee from domain, f(e,e) is false, and also for each object s of type std::string from domain f(s, s) is false. This requires f to be callable with (std::string, std::string) and (Employee, Employee).

Note that this relation must be held for each element of domain separately. This separate elements (s) and (e) may still be equivalent (!f(s, e) && !f(e, s)), but they remain totally different objects. This is an important property of weak ordering, where (in contrast to strong ordering) different elements of the domain may be equivalent, while there are not the same object. This is why I am proposing relaxing only for Relation.

tomaszkam commented 6 years ago

Here you are referring to the MathematicalLess predicate from your paper, which runs afoul of C's unfortunate integer promotion rules. It is not any problem with the mathematical definition of strict weak orderings or less as it relates to CommonReference. Regardless, I agree this is an interesting problem that needs to be looked at

This is only an illustration, the same problems occur if you want to write a relation for two different types, that already have a specialization of CommonReference with defined semantics. If you refer to section 3.4 of my paper, I am showing that you can have only one specialization of CommonReference for Employee and std::string_view, and as consequence, you can only have one relation, that matches sematnics of given specialization. I mean that CommonReference for Employee and std::string_view can either chose name or surname but not both.

The fact is that CommonReference is a singleton for a given set of types, while you may want to have different relations accepting same types. But due semantic requirement, that invocation on types, has same semantics as invocation on CommonReference, you can only have one.

tomaszkam commented 6 years ago

I would need to see proof of that claim. I currently have no reason to believe that, say, using ranges::less<> as the ordering for a std::map<std::string, int>, and then doing a search with std::string_view would cause any problems. The compiler in my head (not infallible!) says it should work just fine.

Again, section 3.3 shows an example. Instead of having std::map<std::string, Employee> (map for name to object), I want to use std::set<Employee, NameLess> es that will keep Employees with unique name (reduces copies of name). Now if I add is_transparent to my NameLess, I am able to query set for given name, just by calling es.find("Kowalski"). Transparent comparators, are not limited to std::less<> and std::string and std::string_view.

tomaszkam commented 6 years ago

std::vector and std::array<int, 5> do have a common type, but the compiler can't guess it. We need to help with a common_type<Range, Range> specialization. In addition, we should have relational operators that operate on ranges, deferring to ranges::equal and ranges::lexicographical_compare for their implementation. One those are added, ranges::lexicographical_compare(vvi, vai, ranges::lexicographical_compare); indeed will compile.

Where this change is proposed? After the Standard Library Concepts was merged into IS, we are not longer discussing possible designs, but the things that we can reliably use with C++20. Everything that is not currently in IS, may not be available, so it is not an answer.

ericniebler commented 6 years ago

Irreflexible means that for any element x for the domain D, the r(x, x) is false. If your domain is the sum of object of type Employee, and object of type std::string, [...]

(I think you mean union instead of sum above. We're not talking about sum types here.) This is where you lose me. A union of apples and oranges doesn't correspond to any notion of "domain" as I understand it. In EoP, only homogeneous functions have a domain, and a relation is a homogeneous predicate. N3351 offers two weaker formulations, one where the two arguments share a common type manifest in the program (the domain is common_type_t<X, Y>), or else the common type is purely notional and not (necessarily) manifest in the program (the domain is that notional type). "Employee" and "string" don't share a common type (manifest or notional), so any predicate over them is not a relation.

Neither N3351 nor EoP is a sacred text, but I lean on both heavily to put STL2 on sound theoretical footing. If we didn't care about equational reasoning, there would be no sense in implementing an STL2 at all.

FYI, even N3351 says that comparing Employees and strings is an abomination that should not be tolerated, even when discussing the most relaxed form of Relation, and I think if @sean-parent knew that that's what you were arguing for, he would be less supportive of your paper.

This is only an illustration, the same problems occur if you want to write a relation for two different types, that already have a specialization of CommonReference with defined semantics. If you refer to section 3.4 of my paper, I am showing that you can have only one specialization of CommonReference for Employee and std::string_view, and as consequence, you can only have one relation, that matches sematnics of given specialization. I mean that CommonReference for Employee and std::string_view can either chose name or surname but not both.

Exactly. This is the whole point. This is Alex Stepanov telling you you are trying to do something nonsensical. I see this as the design of STL2 working as it should.

STL2 is not an outlier in this regard. A single type in Haskell cannot model a type class in two different ways. You must create wrapper classes and opt in to which ever meaning you want. That is how you would do it in C++ (e.g., with a wrapper that adapts an Employee to be the value of either its name or surname).

Again, section 3.3 shows an example. Instead of having std::map<std::string, Employee> (map for name to object), I want to use std::set<Employee, NameLess> es that will keep Employees with unique name (reduces copies of name). Now if I add is_transparent to my NameLess, I am able to query set for given name, just by calling es.find("Kowalski").

I understand the example in 3.3. It fails because it tries to build a relationship between apples and oranges. This statement from your paper is false: "constraining the ordered containers with current StrictWeakOrdering concept, will de-facto remove functionality of heterogeneous lookup from ordered associative containers." The CommonReference constraint only prevents relations of things that are not different expressions of the same underlying value. An Employee is not a string, or vice versa.

You can make the example in 3.3 compile by defining an Employee wrapper such that its "value" is the employee's name. Then you can look up with a string and access the wrapped Employee.

Where this change is proposed? After the Standard Library Concepts was merged into IS, we are not longer discussing possible designs, but the things that we can reliably use with C++20.

We are discussing possible changes to what has already been proposed for C++20. Neither any_view nor common_type<Range, Range> has been proposed yet.

tomaszkam commented 6 years ago

(I think you mean union instead of sum above. We're not talking about sum types here.) This is where you lose me. A union of apples and oranges doesn't correspond to any notion of "domain" as I understand it

I mathematics domain is just set of elements (objects), that all, {1, 2, 3} is a valid domain. If you have two sets X and Y, the X u Y is valid domain. There is no problem of having domain constructed out of apples and oranges, and building relation over it. If I prove that my relation fullfills axioms or ordering, that I can use it as ordering over my domain - that is the pover of mathematics, when you prove that you meet certain conditions, you can use all underyling theorems. That is underlying mathematical model.

To illustrate, you can have a relation that compares polynomials and numbers - for polynomial its return its order, for number it is identity. In case of this relation all polynomials with order 3 and number 3 are equivalent. This is mathematicall sound relation, and there is nothing nonsenthical with it.

The point is you are saying, that it is "your" understanding of what domain is limited to, while my point is that underlying mathematical theory allows to make relations over apples and oranges (in your worlds). When you are making generic library - the promise is that the algorithm will work with any types matches underlying concepts, and the promise is broken.

tomaszkam commented 6 years ago

If we didn't care about equational reasoning, there would be no sense in implementing an STL2 at all.

I care about the equivalence reasoning, but it does not require a homogeneous domain. There is no real "types" in mathematics.

ericniebler commented 6 years ago

There is no real "types" in mathematics.

I don't agree with you. "Real", "integral", and "complex" are types in math. "Ring" and "Group" are type classes. Algorithms exist in math and their domain is defined with types or type classes. The mathematics that underpins a strongly typed language like C++ very much involves types and type classes (concepts), as well as an abstract notion of values like "zero" and "one" and concrete embodiments of those values as objects with types.

To illustrate, you can have a relation that compares polynomials and numbers - for polynomial its return its order, for number it is identity.

This is a predicate, but not a relation. If your algorithm takes a relation (less :: Int->Int->bool) and a projection (order :: Polynomial -> Int), then you can perform the comparison in a way that makes sense. This is why the STL2 algorithms accept projections.

ericniebler commented 6 years ago

Please read EoP. These terms, and the whole of the mathematical foundation for the STL, is defined there. A relation is a binary predicate where both arguments have the same type. A weak ordering is a transitive relation: op(a,b) && op(b,c) ==> op(a,c). If "a" is an "employee", "b" is a "string", and "c" is a "refrigerator", then op(a,c) may in fact be false or it may be outside the domain of "op". This breaks the transitivity of "op". However, if "a", "b", and "c" are both different ways to represent the same underlying set of values ("std::string", "std::string_view", "MyString") then we get transitivity and "op" can be a relation. Make sense?

tomaszkam commented 6 years ago

A relation is a binary predicate where both arguments have the same type. A weak ordering is a transitive relation: op(a,b) && op(b,c) ==> op(a,c). If "a" is an "employee", "b" is a "string", and "c" is a "refrigerator", then op(a,c) may in fact be false or it may be outside the domain of "op". This breaks the transitivity of "op".

If you algorithm is calling op over a, b, c then {a, b, c} x {a, b, c} needs to be subset of domain of "op" (it is union type), if they aren't then you call is undefined for that given reason. And if they are in, and transitivity is not held (as in your example), then again call is undefined. But if all axioms all held, the algorithm will work correctly.

ericniebler commented 6 years ago

I agree. But the current definition of the Relation concept, and the one in your paper, give no way to test that a callable is a relation over three distinct and unrelated types. Relation<Op, A, B> && Relation<Op, B, C> && Relation<Op, A, C> suffers from the problem I described above.

In the current design, we first test that all the types share a common reference (CommonReference<A, B, C>), then we test the 3 Relation requirements above, and we have high confidence (not perfect) that op is actually a relation over the 3 types. (It's not perfect because we haven't tested that the common reference between each pair of types is the same as the common reference of all three. We could add that requirement, too, but generally we guard against Murphy not Machiavelli.)

tomaszkam commented 6 years ago

But the current definition of the Relation concept, and the one in your paper, give no way to test that a callable is a relation over three distinct and unrelated types. Relation<Op, A, B> && Relation<Op, B, C> && Relation<Op, A, C> suffers from the problem I described above.

I need to have only Relation<Employee, std::string> (there is no third type, and having relation over 3 types would be separate discussion, but the STL2 does not include any algorithms accepting 3 set of values, so the point is mood) and I am checking if my operator satisfies Predicate<Employee, std::string>, Predicate<std::string , Employee>, Predicate<Employee, Employee>, Predicate<std::string , std::string> - that means that all possible invocations are in the domain.

Now, I should be able to prove, that having two sequence R1 r1 and R2 r2 and comparator F f is a weak ordering over the sum of r1 and r2. To check this we have the following:

template<typename R1, typename R2, typename F>
bool irreflexible(R1 const& r1, R2 const&  r2, F f)
{
    auto check = [&f](auto const&  range) {
        for (auto const& e : range)
            if (f(e, e))
               return false;
    };
    return check(r1) && check(r2);
    // Does O(size(r1) + size(r2)) checks, i.e. O(n)
}

template<typename R1, typename R2, typename F>
bool assymetric(R1 const&  r1, R2 const&  r2, F f)
{
    auto check = [&f](auto const& r1, auto const& r2) {
        for (auto const& e1 : r1)
          for (auto const& e2 : r2)
            if (f(e1, e2) && f(e2, e1))
               return false;
    };

  return check(r1, r1) && check(r2, r2) && check(r1, r2) && check(r2, r1);
  // Does O(n^2) introducing common type will double the amount
}

template<typename R1, typename R2, typename F>
bool transtive(R1 const&  r1, R2 const&  r2, F f)
{
    auto check = [&f](auto const& r1, auto const& r2, auto const& r3) {
        for (auto const& e1 : r1)
          for (auto const& e2 : r2)
            for (auto const& e3 : r3)
                if (f(e1, e2) && f(e2, e3) && !f(e1, e3))
                     return false;
    };

  return check(r1, r1, r1) && check(r1, r1, r2) && check(r1, r2, r1)
         && check(r1, r2, r2) && check(r2, r1, r1) && check(r2, r1, r2)
         && check(r2, r2, r1) && check(r2, r2, r2);
  // Does O(n^3) introducing common type will double the amount
}

template<typename R1, typename R2, typename F>
bool weak_ordering_over(R1 const&  r1, R2 const&  r2, F f)
{
    return irreflexible(r1, r2, f)
          && assymetric(r1, r2, f)
          && transitive(r1, r2, f)
          && transtive(r1, r2, [&f](auto const& e1, auto const& e2) { return !f(e1, e2) && !f(e2, e1);
}

Note the above code works regardless if value types of r1 and r2 are the same or different, and our Relation requirement guarantees that above code will compile.

Now, when I have call to my equal_range(s1, v, f) I can include the following preconditions:

[[expects audit is_sorted(s1, f)]]
[[expects audit weak_ordering_over(s1, view::single(v), f)]]

And my algorithms are guaranteed to work correctly.

tomaszkam commented 6 years ago

In the current design, we first test that all the types share a common reference (CommonReference<A, B, C>), then we test the 3 Relation requirements above, and we have high confidence (not perfect) that op is actually a relation over the 3 types. (It's not perfect because we haven't tested that the common reference between each pair of types is the same as the common reference of all three. We could add that requirement, too, but generally we guard against Murphy not Machiavelli.)

Relation is just selected subset of cartesian of the value (i.e. A x B x C) there is really nothing more behind this concept in mathematic (you can represent it as na predicate, returning true if given tuple of values was in selected subset).

The functions f: A -> B is just an relation A x B that builds mapping between input and output (this is real definition of the function), so relations are naturally heterogeneous.

akrzemi1 commented 6 years ago

There is no real "types" in mathematics.

I don't agree with you. "Real", "integral", and "complex" are types in math. "Ring" and "Group" are type classes. Algorithms exist in math and their domain is defined with types or type classes. The mathematics that underpins a strongly typed language like C++ very much involves types and type classes (concepts), as well as an abstract notion of values like "zero" and "one" and concrete embodiments of those values as objects with types.

To illustrate, you can have a relation that compares polynomials and numbers - for polynomial its return its order, for number it is identity.

This is a predicate, but not a relation. If your algorithm takes a relation (less :: Int->Int->bool) and a projection (order :: Polynomial -> Int), then you can perform the comparison in a way that makes sense. This is why the STL2 algorithms accept projections.

This discussion starts to be too much about the ideals and moves away too much from practical problems.

The practical problem is that I have a sorted (using predicate P) container of Ts and I want to call lower_bound() with a value of type U and a function object Q (call it a predicate or not) that is "compatible" with P.

Eric is saying, "fine, but make sure there is a common_type<T, U>, and that T and U are convertible to this type".

I am saying: "I do not want to, because defining this is too complicated (common_type<T, U> may be more difficult to write than both T and U)".

Eric is saying: "that you will not use lower_bound(), because we cannot reason about the semantics."

(Sorry if I oversimplified your point.)

But let me clarify: I am refusing to write common_type<T, U> not because T and U are unrelated, but because I have time/financial/organizational/deadline-related/personal constraints. Conceptually, T and U do represent the same domain of abstract values in the context of function Q, but I do not want to be forced to type it. I guarantee that I have defined Q so that there will be at most one position in the range where: Q(t[n], u) && Q(u, t[n+1]) that range is considered sorted with respect to Q, and I expect lower_bound() just to run. The relevant axioms can be stated without reference to common_type<T, U> . These will not be the identical axioms as those from EoP or Palo Alto Report, but they will be sufficient to guarantee that the algorithm will terminate, will not go astray with iterators, and will give the correct result with respect to Q.

The constraint that Q(u, t) Q(t, u) Q(u, u) Q(t, t) are well formed with corresponding adapted axioms, but without the requirement on common_type<T, U> are sufficient to reason about the correctness of the algorithm, sufficient to insert clever run-time checks inside the algorithm in the DEBUG mode.

I am saying, you can have sound concepts, but relaxed compared to EoP. T and U still represent subsets of the same abstract domain, but I should not be forced to encode this abstract domain in the C++ type system. This is a trade-off between what you can express in mathematics on one hand, and on ease of use on the other.

ericniebler commented 5 years ago

I believe with the adoption of P1248 this issue is mitigated. We still have to deal with #610, but I'll close out this one. Somebody shout if they disagree.