davidstone / operators

Automatically generate canonical forms of operators, taking advantage of the latest C++ features
Boost Software License 1.0
11 stars 0 forks source link

Some small suggestion & comments on P1046R2 and further standardization... #2

Open Mick235711 opened 2 years ago

Mick235711 commented 2 years ago

Just finished reading on P1046R2 and I felt that this will be a very good addition to C++, freeing us from ever writing duplicate boilerplate codes for operators and remembering the guidelines for rewriting them. The goal to simplify C++ operator overloading seems a very clear goal that is worth developing further, which both the operator<=> proposal and DT and other features addressed (partly). Some small and humble comments are presented below after reading the paper... just my own observation and feel free to dispose or apply.

  1. The paper is suggesting rewriting nearly all of current operators overloading guidelines, and superseding them by rewriting calls. While it is certainly desirable to eventually adopt all guidelines as standard, I personally think that WG21's current trend is to reduce the scope of a single paper, and split one large paper (for example, P2214 into separate views proposals, P2286's disposition of additional format specifiers, and others) into several independent smaller papers, and advance them independently. Though this is definitely a result of online, small meetings/telecons, given the current pandemic situation I do not think this will change much even after F2F meetings resume (2022-07 New York? skeptical personally to resume that fast). So probably this will need to be split into several proposals each dealing with operators in a group. My personal suggestion is to consider
    1. A proposal to specifically propose operator-> and operator->* rewrite into dereferencing + member-access or member-pointer-access. The reason for this is both that these operators naturally group together to iterator/smart pointer access, and this requires greater library change to remove the requirement on explicit operator-> calls mentioned in the paper, and lastly, these two operators are very useful, even more, useful than other operators in the paper (elaborate later). Honestly, if WG21 is okay with it I wish to just split it into this paper and P1046R3 that include everything else (and accepted as a whole). However, given current policy the secondary personal split suggestion is given below:
    2. A proposal to propose all operator@= rewrite into operator@ + = (or vice versa, elaborate later) (@ = + - * / % ^ & | << >>. These are very useful too in reducing boilerplates and looks like it does not require any library changes (except delete the obsolete operators). The issue on this is the rewriting direction which is elaborated below.
    3. A proposal to rewrite ++ -- and unary + -. Rewrite from postfix ++ -- into prefix form is definitely useful and likely to not gain many criticisms, especially after R2's change on non-copyable types. Rewrite from unary -a into 0 - a is also useful though not employed that much, and rewriting unary + seems to need more care (again, elaborated below).
    4. A proposal for symmetry or everything else that may be worth rewriting. This will be discussed in the last section below.
    5. Additionally, in the operator<=> case the language change and library change (remove relational operators) are advanced in separate documents. I don't know if this is the trend (and I don't suggest this, I'd prefer library changes to be adopted in the same proposal), but probably the library changes would need to be specified separately. Fortunately, the library specification for operators other than -> should be trivial.
  2. First of all, I definitely support automatic generation over library solution / = default following the step of relational operators, both make it easier to write a class and reduce boilerplate.
  3. As for the detailed operators themselves, I first noticed the operator-> case. The reason for that is that this operator is the most unusual one, looks like a binary operator but only takes one argument, and has the curious link-arrow property (call operator-> recursively until hit a pointer). This had prevented many iterators (istreambuf_iterator, views, etc.) from gaining operator-> simply because it cannot support rvalue-ness of a->b result, and basically the lack of a rewrite mechanism is exactly why we are deprecating move_iterator::operator-> as part of P1252. In my opinion, this reason on its own will make it worthwhile for trying to get rewriting operator-> into the standard, as fast as possible. As for the closely-linked operator->*, the reason is more likely not weird operator semantics, but the lack of knowledge for that operator for many people. Rewriting operator->* will make standard smart pointer types gain the operator, which is obviously a very good thing and fixes a defect. The caveats for this separate proposal (as described in 1.1) I found is summarized below:
    1. The first one is obviously the library changes mentioned in the proposal. We failed to make the addressof(**this) change in C++20, so it must be breaking to have those changed in later. However, the breakage we would expect is supposed to be extremely small, and I think it is worth it to make these breaking changes. We also need to change to_address, pointer_traits, and iterator_traits, for which I support deprecating pointer (feels weird from the start), which can be proposed separately or proposed while specifying a fallback. As for to_address, I support also option 4 to limit to contiguous_iterator.
    2. The second is the issue for ostream(buf)_iterator and inserter iterators, as their operator* returns *this and the generated -> is either unusable or weird. I would propose all these to = delete their operator-> since they are not supposed to be usable like this.
    3. A subtle issue if we pursue this for C++26 would be the move_iterator's deprecated operator->. LEWGI had voiced to approve the removal of this in C++23, but since P2139R2 sanked due to not having enough time to discuss it in the C++23 timeframe (at least, before feature freeze), it looks increasingly possible that C++23 will not remove this deprecated -> (unless we accept the paper after feature freeze). This means that if we pursue C++26 rewriting operator->, we need to make move_iterator::operator-> as =delete since otherwise, this will be a silent changing of behavior from C++23 to C++26. We probably can revive the operator after at least a removal cycle.
  4. Now the next important ones are the rewriting from operator@= to operator@. This covers an important guideline to rewrite compound operators, and I agree that + is more fundamental since our mental model is that += is + with =. However, every guidelines I can found about this support rewriting + in terms of += even after move semantics, and based on my discussion with others on Discord the += to + rewrite seems only efficient if you write all 4 overloads on l/r-valueness on operator+. It seems that in the common case that we only write 1 A operator+(const A&, const A&) and 1 A::operator+=(const A&) & (also an interesting question is whether we add ref-qualifier here), the + to += rewrite is still more efficient (especially if you write A operator+(A, const A&) and rely on the parameter to obtain a copy since this will optimize a + b + c + d case since you will do a move instead of copy on the later 3 +). Still, I'm not very sure on this so if there is concrete evidence that @= to @ rewrite is overall better, I can change my mind. Also, for the library impact on this, I agree on filesystem::path should have +=, and I'm neutral on whether we should add basic_string + initialize_list (not sure if this is natural or not). Also, streams should probably delete their <<= >>= (the question is how, simply do a basic_ostream::operator<<=(auto&&) = delete will not prevent basic_ostream::operator<<(A&&) to beat this, since non-templates are preferred over templates and rewrite participates in one-round overload resolution as per current rules), definitely rare candidates will be supporting this but better prevent them all instead of relying on no-one write an cout << a that returns ostream&&. We may consider preferring non-rewrite overload unconditionally or tweak the resolution rule (to make rewrite more important than template-ness) to prevent this.
  5. Now for rewriting postfix ++ -- to prefix since this is likely to gain consensus more than the prefix rewrite. For this, I think that it is reasonable that everyone should think the post/prefix version behaves the same except for return value, and I agree to follow the Ranges practice of having non-copyable type returning void for the postfix version. Still, I think the ostream(buf)_iterator returning reference is interesting (they only do this for keeping with legacy output iterator requirements) but okay, but those common_iterator wrapped issues probably did need further discussion (probably if we do rewrite, we agree that prefix is more fundamental and we should always call prefix one?).
  6. From here I think are some controversial rewrites. First one is rewrite from prefix ++ -- into += 1 -= 1. This is natural for random access and contiguous iterators only, which explains why so few iterators in STL benefits from this. Fortunately for lower iterators due to the language rules proposed, the rewrite will not happen, so that is okay. While complex and atomic<floating_point> is definitely a good example of this (and they should adopt it), the string and path case probably rings a bell, and probably we want to discuss if we disable char overload (or more deeply, not accept any operator+= other than integral types, another example I can come up with is some type supporting += true but definitely not ++, such as bool itself removing --). I generally support this rewrite since it does more good and no harm, but this will be controversial and probably needs limiting.
  7. We then went on to unary + -. I support the discussion that it will be wrong to rewrite a - b into a + -b, and rewriting -a into 0 - a indeed looks promising. However this works very bad practically with any iterators, pointers, and probably unsigned integer negation is a mistake. Therefore I don't think this should be automatically, instead be opt-in and require writing something like A::operator-() const = default. I think with opt-in this seems reasonable and we don't really should downgrade further into a library feature. As for unary +, this is more complicated and I think we have three options to choose from:
    1. Following the steps for unary - and rewrite +a to 0 + a. This will need to be opt-in for the same reason (iterators should probably not support this...).
    2. Observe that this is usually a no-op and simply rewrite to return a copy. Either this will still need to be opt-in, or we need to have some criteria to not auto-generate this for everything (probably the existence of (binary) + += (unary) - are all possible).
    3. Rewrite to -(-a). This seems bad performance-wise and should not be preferred IMO. My preference is to be opt-in and rewrites to 2, though this basically does not save much (= default; vs {return *this;}) and probably it will not be worthwhile to pursue this.
  8. Then for other operators not discussed. For a similar reason, we should not rewrite - into + or / into *, and for && || , and unary & we should discourage the overloading so we should not rewrite even if we have reasonable rewrite target (a && b into a & b probably). For allocation, we may consider rewriting new[] into new, but I'm not sure if there is any value and/or possible issues with that. co_await and operator() does not have appropriate rewrite target (the only one I can think of is rewrite a(...) into (*a)(...), similar to what pointer-to-function call works; however this does not look promising).
  9. operator! is also interesting in the sense that most type does not provide it, but it is still available through the first converting to bool then negating it. I think this is already a (good) rewrite, and we should not propose further (such as rewrite into ~ binary negation).
  10. The last overloadable operator we do not consider in 8 is subscript, operator[]. I agree with the proposal that rewrites a[b] into *(a + b) or *(begin(a) + b) will not work well (even if opt-in, we need to choose between two, so I think this is not worthwhile). However alternatively, P2128 had suggested another possible approach, to rewrite vec[a, b, c, d] into vec[a][b][c][d] to enable multi-dimensional operator[] as the ultimate general multidimensional array access syntax. The reverse has problems as described in P2128, but this direction seems okay and worth it due to uniform access in generic programming. However, if we pursue this the overload resolution must be two-round (unconditionally prefer >1 parameter operator[] if present, otherwise failback) since we cannot resolute between multiple 1-arg operator[] and one N-arg operator[]. If this can be specified correctly then I think this is worthwhile to be pursued as the mentioned follow-up proposal to P2128.
  11. Now the last topic on this is symmetric rewriting (rewrite a + b into b + a for different types). Since we did this for == and <=>, I think this looks promising for +, *, &, | and ^ since these are generally assumed to be symmetrical. There does not seem like a lot of standard types supporting T + U, but this is a promising direction and probably should be provided, given that we can write an overload less for these operators in a not-same-type case. However, an interesting case for * overload is that in some areas (linear algebra vector/matrix multiplication) this operator is generally assumed to not be symmetrical, but I don't think this is reasonably heavy enough to drop symmetric operator*.

Anyway, these are just my own thoughts and are just for reference, and good job on the proposal. I wish to see it adopted soon in whatever form.

davidstone commented 4 months ago

Thanks for the comments. I am proceeding by splitting the paper up into smaller papers, one for each category of proposal. I'll update the documentation for this with the new paper numbers.