ericniebler / range-v3

Range library for C++14/17/20, basis for C++20's std::ranges
Other
4.08k stars 440 forks source link

Make `iter_zip_with_view` cursor/sentinel inspectable to find unused remainders #1589

Open ccorn opened 3 years ago

ccorn commented 3 years ago

Warning: Illustrative C++17 code fragments with use of Catch2 macros ahead.

Thanks for zip and zip_with, particularly because zip can do references:

TEST_CASE("Range v3 zip") {
    using V = std::vector<int>;
    using ranges::equal;
    using ranges::views::zip;
    using ranges::size;
    using ranges::views::take;
    V v{0,1,2,3,4,5,6,7,8,9};

    SECTION("zip can do references") {
        V w(size(v));
        for (auto&& [b, a]: zip(w, v)) b = a;
        CHECK(equal(w, v));
    }

Also nice, though not documented yet(?), is that zip truncates the resulting range to the shortest size of the component ranges:

    SECTION("zip with shorter first argument") {
        V w(size(v) - 1);
        for (auto&& [b, a]: zip(w, v)) b = a;
        CHECK(!equal(w, v));
        CHECK(equal(w, take(v, size(w))));
    }
    SECTION("zip with longer first argument") {
        V w(size(v) + 1);
        for (auto&& [b, a]: zip(w, v)) b = a;
        CHECK(!equal(w, v));
        CHECK(equal(take(w, size(v)), v));
    }

This truncation is achieved by or-ing the iterator equality results of the component ranges.

Now, when the zip cursor compares equal to a sentinel, I would like to be able to find out which component ranges, if any, have unused elements left over. That would be possible without duplicated effort if the zip cursor/sentinel types allowed read-only inspection of their internal tuples, like this patch does when applied to the "Thanks ISO" release:

--- a/include/range/v3/view/zip_with.hpp
+++ b/include/range/v3/view/zip_with.hpp
@@ -193,6 +193,12 @@ namespace ranges
             sentinel(sentinel<Other> that)
               : ends_(std::move(that.ends_))
             {}
+
+            // For inspection: const reference to the sentinel tuple
+            auto as_tuple() const noexcept -> decltype((ends_))
+            {
+                return ends_;
+            }
         };

         template<bool Const>
@@ -310,6 +316,12 @@ namespace ranges
             {
                 return move_(meta::make_index_sequence<sizeof...(Rngs)>{});
             }
+
+            // For inspection: const reference to the iterator tuple
+            auto as_tuple() const noexcept -> decltype((its_))
+            {
+                return its_;
+            }
         };

         template<bool Const>

(Yes, decltype((member)) reflects the const-ness of *this.) With that, the zip examples above can be rewritten with detailed end state inspection:

#if EXTENDED_ZIP_API    // requires patched <range/v3/view/zip_with.hpp>
    SECTION("Extended zip API, shorter first argument") {
        using ranges::get_cursor;
        V w(size(v) - 1);
        // Decompose the range-based `for` to keep the last iterator
        auto&& zipped = zip(w, v);
        auto zi = zipped.begin();
        auto ze = zipped.end();
        for (; zi != ze; ++zi) {
            auto&& [b, a] = *zi;
            b = a;
        }
        auto&& zit = get_cursor(zi).as_tuple();
        auto&& zet = get_cursor(ze).as_tuple();
        CHECK(zit != zet);      // size mismatch detected
        auto&& [bi, ai] = zit;
        auto&& [be, ae] = zet;
        CHECK(bi == be);        // all elements of w used up
        CHECK(ai != ae);        // some unused elements in v
        CHECK(!equal(w, v));
        CHECK(equal(w, take(v, size(w))));
    }
    SECTION("Extended zip API, longer first argument") {
        using ranges::get_cursor;
        V w(size(v) + 1);
        // Decompose the range-based `for` to keep the last iterator
        auto&& zipped = zip(w, v);
        auto zi = zipped.begin();
        auto ze = zipped.end();
        for (; zi != ze; ++zi) {
            auto&& [b, a] = *zi;
            b = a;
        }
        auto&& zit = get_cursor(zi).as_tuple();
        auto&& zet = get_cursor(ze).as_tuple();
        CHECK(zit != zet);      // size mismatch detected
        auto&& [bi, ai] = zit;
        auto&& [be, ae] = zet;
        CHECK(bi != be);        // some unused elements in w
        CHECK(ai == ae);        // all elements of v used up
        CHECK(!equal(w, v));
        CHECK(equal(take(w, size(v)), v));
    }
#endif
}

Notes:

ccorn commented 3 years ago

A better name might be pos_tuple. This is inspired by the internal name range_access::pos that get_cursor delegates the work to, and by the fact that the zip cursor (but not the sentinel) also carries a function object which is not exposed with this method.

marehr commented 3 years ago

I understand the general idea, but how does this work within the views' ecosystem?

This would only be helpful when not piping any new views on top of zip, since most views will wrap the iterator type, which is not aware of that functionality, this seems not really applicable in general.

I'm genuinely interested in how one could offer functionality like that. I have seen that the standard offers a base() accessor to access the wrapped iterator (constexpr iterator_t<Base> base(), https://eel.is/c++draft/range.transform.iterator). With this one could recursively go back to that wrapped iterator again and access its extra functionality.

ccorn commented 3 years ago

I understand the general idea, but how does this work within the views' ecosystem?

It extends usabilty of the views ecosystem by keeping relevant state information accessible.

This would only be helpful when not piping any new views on top of zip, since most views will wrap the iterator type, which is not aware of that functionality, this seems not really applicable in general.

Generally, there must be a way to read iterator/sentinel state in terms of its component iterator/sentinel(s). This is important when having reached the end does not imply that all component iterators have reached their end. It is also relevant when breaking out of a range-based for loop (though usage of zip could often prevent the need for that).

I'm genuinely interested in how one could offer functionality like that. I have seen that the standard offers a base() accessor to access the wrapped iterator (constexpr iterator_t<Base> base(), https://eel.is/c++draft/range.transform.iterator). With this one could recursively go back to that wrapped iterator again and access its extra functionality.

The base() approach seems to be one step in that direction of state transparency, but it seems to presume that there is only one component range. The proposed pos_tuple approach for composite ranges like zip seems to be generalizable. It could be applied to other composite ranges like concat as well.

marehr commented 3 years ago

Interesting, so the proposal is to access the internal composition of the iterators if they have a natural composition type.

For example: zip_view produces (zip_iterator, zip_sentinel) whereas zip_iterator/zip_sentinel is a std::tuple of the underlying iterators/sentinels.

or concat_view produces (concat_iterator, concat_sentinel) whereas concat_iterator/concat_sentinel is a std::variant of the underlying iterators/sentinels.

The problem remains, that you can create arbitrary complicated tuple and variant structures, i.e. std::tuple<it1, std::variant<std::tuple<it2, it3>, it4>>, this is more or less solved by base() you can recursively access that structure (but still range iterators are not forced to support that).

And you need special care for input_iterator, since they can be move-only.

ccorn commented 3 years ago

The problem remains, that you can create arbitrary complicated tuple and variant structures, i.e. std::tuple<it1, std::variant<std::tuple<it2, it3>, it4>>, this is more or less solved by base() you can recursively access that structure (but still range iterators are not forced to support that).

That may be one of the reasons why the ranges code here distinguishes between iterators and cursors. Cursors deal with the particularities, whereas iterators strip those off, leaving only the common features.

Arbitrary complicated tuple and variant structures of iterators are not a problem because whoever builds a nested view also knows what to expect about its composition, and he does not dig into components where he does not know what the ingredients should be. This means that there is no need for a uniform interface when it comes to structural peculiarities of a composite cursor.

In an earlier post I suggested that concat cursors could offer a state inspection in terms of a tuple as well. That might be doable, but (read-only) access to the underlying variant would probably be more appropriate. Again, there is no need for a uniform interface when it comes to inspecting cursors, but there should be interface symmetry with their sentinels because of the end_cursor_t twist.

(From my pragmatic view, concat is not in pressing need of inspection anyway because exhausting a concat range exhausts all its component ranges. But it helps to consider such examples as well.)

So what I want is (1) access to cursors instead of iterators: Granted with get_cursor; (2) have the cursor offer read-only inspection of its mutable state, in its own particular fashion, and the corresponding sentinel offering the same method. Note that in the zip example, the method returns a (const) reference, so even move-only components are not a problem.

The downside is that, once specified, this reduces freedom of implementation if the inspection is to be offered efficiently, preferably with const references to avoid move/copy issues. But that's mostly a question of specification: Instead of concretely specifying a return type of pos_tuple, specify "something that is suitable for structured binding to auto&& references" because that's the intended usage pattern.

ccorn commented 3 years ago

To conclude: base() would enable inspection of pipeline predecessors. But it is not optimized for that; it needlessly requires copyable iterators instead of deliberately returning a reference to const Base for inspection. This might be a serious deficiency: Often the purpose of inspecting the ending zip iterator would be to detect scenarios that would result in a broken pipe (i.e. unconsumed content) when left unhandled. Those scenarios would involve input iterators, and those might have issues with attempts of copying.

One might argue that in such cases, one could simply create a new input iterator from scratch. But that's presuming that you know where the original input iterator came from, which is not the case in generic code. And, frankly, why on earth should you even have to deal with that, when a const reference for comparison is all you need?

Finally, .base() presumes that there is only one source (from the preceding pipeline stage). But zip and concat have multiple sources and thus do not fit in the linear graph of the pipe model. Therefore, zip and concat need something special anyway.

By now, I think that zip cursor/sentinel inspection functions returning "something that is suitable for structured binding to auto&& or const auto& references" should be the design goal. Thus zip cursor/sentinel implementation is allowed to construct and return a tuple of const references if it does not contain a suitable tuple already. The above as_tuple (or pos_tuple, or base_tuple, or source_tuple_ref_or_source_ref_tuple_it_should_not_matter_to_you) already meets these design considerations.

I'd like to to think about concat as well, but there are some subtleties, mostly related to the fact that the end sentinel contains just the end of the final range, which is entirely appropriate for its mission but means that inspecting it will not reveal as much information as may be wanted.