ericniebler / range-v3

Range library for C++14/17/20, basis for C++20's std::ranges
Other
4.11k stars 440 forks source link

Copying in for_each / yield / lazy_yield_if #455

Open aldanor opened 8 years ago

aldanor commented 8 years ago

This is more of a question rather than an issue (so please feel free to close if it's irrelevant) -- I've seen a few recent open issues regarding yield/for_each and decided to do a bit of testing to check the copy/move ctor call counts in the final optimized binary.

#include <iostream>
#include <memory>
#include <vector>

#include "range-v3/include/range/v3/all.hpp"

struct stats_t {
    int dc; int cc; int mc; int ca; int ma;
    void reset() { *this = stats_t(); }
    friend std::ostream& operator<<(std::ostream& os, const stats_t& v) {
        return os << "default-ctor: " << v.dc << ", "
           << "copy-ctor: " << v.cc << ", " << "move-ctor: " << v.mc << ", "
           << "copy-assign: " << v.ca << ", " << "move-assign: " << v.ma;
    }
} stats {};

struct object {
    object() { stats.dc++; }
    object(const object& obj) { stats.cc++; }
    object(object&& obj) { stats.mc++; }
    object& operator=(const object& obj) { stats.ca++; return *this; }
    object& operator=(object&& obj) { stats.ma++; return *this; }
};

int main() {
    using namespace ranges;
    std::vector<object> vec(10);

    {
        auto r = view::for_each(vec, [](const object& obj) {
            return yield(obj);
        });
        stats.reset(); RANGES_FOR(const object& obj, r); std::cout << stats << std::endl;
    }
    {
        auto r = view::for_each(view::iota(0, 10), [](int) {
            return yield(object());
        });
        stats.reset(); RANGES_FOR(const object& obj, r); std::cout << stats << std::endl;
    }
    {
        auto r = view::for_each(view::iota(0, 10), [](int) {
            return lazy_yield_if(true, [] { return object(); });
        });
        stats.reset(); RANGES_FOR(const object& obj, r); std::cout << stats << std::endl;
    }
}

Compiled with gcc 6, c++14, -O3, this yields:

default-ctor: 1, copy-ctor: 30, move-ctor: 61, copy-assign: 0, move-assign: 20
default-ctor: 11, copy-ctor: 20, move-ctor: 61, copy-assign: 0, move-assign: 20
default-ctor: 20, copy-ctor: 10, move-ctor: 10, copy-assign: 0, move-assign: 20

The move ctors I can live with (although 6 moves per element is quite a a lot), but 1-3 copy ctors per iteration is completely prohibitive for anything but the simplest pod types like integers. E.g., in the second example we yield an rvalue, it gets moved 6 times but apparently also copied twice per iteration?

Is there any way with the current implementation to force the copy-ctor count to zero, at least in trivial examples like this (other than ugly hacks like yielding pointers), or is this a design limitation?

Thanks 😉

ericniebler commented 8 years ago

Interesting, thanks. I don't have time to look right now, but I wonder if you made object move-only where the compiler errors out. That will tell you where the copies are being made.

aldanor commented 8 years ago

That wouldn't compile because yield_fn requires SemiRegular:

range-v3/include/range/v3/view/for_each.hpp:110:17: error: static_assert failed 
    "The object passed to yield must be a model of the
     SemiRegular concept; that is, it needs to be default 
     constructible, copy and move constructible, and destructible."
                CONCEPT_ASSERT_MSG(SemiRegular<Val>(),
ericniebler commented 8 years ago

OK, right, because view::single holds an object by value, and views must themselves be SemiRegular. That doesn't explain where the copies are coming from during iteration. I'll look into it.

CaseyCarter commented 8 years ago

That doesn't explain where the copies are coming from during iteration.

single caches in its iterators.

CaseyCarter commented 8 years ago

I have a half-finished change that addresses this and the SemiRegular issue, FWIW.

ericniebler commented 8 years ago

IIRC, I did that because it vastly improved the generated code for range compositions. We really need a suite of benchmarks.

CaseyCarter commented 8 years ago

IIRC, I did that because it vastly improved the generated code for range compositions.

Yes, that is my belief as well. Avoiding pointers has incredible effects on code generation for tight loops - it makes alias analysis easy when there are no pointers.

The repeat view in cmcstl2 uses a heuristic: if the objects are trivially copyable and "small" (<= 4 pointers, chosen completely arbitrarily) I copy them everywhere. This is one of those ideas I intend to investigate in detail "some day."