ericniebler / range-v3

Range library for C++14/17/20, basis for C++20's std::ranges
Other
4.13k stars 439 forks source link

compiler loops on range recursion #579

Closed etam closed 7 years ago

etam commented 7 years ago

One day I came on the https://www.haskell.org/ and found this little code (last line is mine):

primes = filterPrime [2..] 
  where filterPrime (p:xs) = 
          p : filterPrime [x | x <- xs, x `mod` p /= 0]

main = print $ take 10 primes

So I tried to reimplement this using range-v3. This is what I came up with:

#include <iostream>
#include <range/v3/all.hpp>

auto prepend(auto v, auto rng)
{
  return ranges::view::concat(ranges::view::single(v), rng);
}

auto filterPrime(auto rng)
{
  const auto p = ranges::front(rng);
  return prepend(
    p,
    filterPrime(rng | ranges::view::tail | ranges::view::remove_if(
      [=](auto x){ return x % p == 0; }
    ))
  );
}

auto primes()
{
  return filterPrime(ranges::view::ints(2));
}

int main()
{
  ranges::copy(primes() | ranges::view::take(10),
               ranges::ostream_iterator<>(std::cout, " "));
  std::cout << '\n';
}

The problem is that both gcc 6.3.1 and clang 3.9.1 loops infinitely on this code.

I think it's because they're trying to find out the return type of filterPrime function.

Could this be fixed by adding some explicit types instead of auto? Or is there a type-erasing view, that will prevent the return type explosion?

gnzlbg commented 7 years ago

Even if that were to compile, it would not work, because filterPrime(auto rng) is calling itself an infinite amount of times. Also, each lambda has a different type, so IIUC every call is going to monomorphize a different filterPrime(auto rng) function. I cannot but wonder why you are not hitting the recursion limit of the compiler though.

EDIT: after you fix that, you might want to try to rewrite your approach to use view::for_each as an exercise.

Victor-Savu commented 7 years ago

@etam In Haskell, the return type of the filterPrime is [Int], which is the result of the cons operator (:). This operator has the signature a -> [a] -> [a]. What is important to notice is that the second parameter (the so-called tail of the list) has the same type as the return value. This is certainly not the case for ranges::view::concat, which (in your example with 2 arguments) has the signature a -> b -> Concat a b where Concat a b is injective and b != Concat a b for any types a and b. The signature of filterPrime is rng -> Concat (Head rng) (Tail rng) where although Head rng may be identical to Head (Concat (Head rng) (Tail rng)) (and may be reduced to int), Tail is again injective and rng != Tail rng for any rng. This means that the return type of the monomorphisation of the primes function diverges Concat int (Concat int (Concat int (Concat int (...)))).

Note: In your recursive call, you pass rng instead of "tail" of rng. Note that ranges::front(rng) does not "pop" the head element from the range, so it is still there in the recursive call. Make sure to fix that as well, otherwise you will run into a run-time infinite loop.

etam commented 7 years ago

Thanks for your comments.

The problem with compilation I resolved using ranges::any_view, which provides type-erasure.

But, as @gnzlbg said, there is still a problem with filterPrime calling itself eagerly, not lazily.

I still have no idea how view::for_each can help here, but I'm not done yet :)

gnzlbg commented 7 years ago

@etam Maybe this helps? http://ericniebler.com/2014/04/27/range-comprehensions/

Try to understand how the pythagorean triples example work, play a bit with view::for_each (and yield_if !) on simpler things, and give it a try again. This looks like a neat way of getting started with the library btw, don't give up!

gnzlbg commented 7 years ago

@etam Disregard my previous comment, I came with a cleaner solution that does not use view::for_each (one can write one with for_each though, but its a bit pointless).

there is still a problem with filterPrime calling itself eagerly, not lazily.

Make the call lazy then ;)

Victor-Savu commented 7 years ago

Ah, sure! I missed the eager evaluation :) ! But even with a lazy call, you still need to skip the first element in the range.

gnzlbg commented 7 years ago

@etam is using view::tail for that (which I like more in this case than view::drop(1)). I haven't been able to come up with a clean solution that removes the any_range<int>s though, so if somebody figures out a clean way to do that, please speak up :D

[spoiler] my current solution (without view::for_each)

```c++ #include #include using namespace ranges; using av = any_view; av filterPrime(av rng) { int p = front(rng); return view::concat( view::single(p), view::single(rng | view::tail | view::filter([p](int x) { return x % p != 0; })) | view::transform([](auto&& fxs) { return filterPrime(fxs); }) | view::join); } int main() { std::cout << view::all(filterPrime(view::ints(2)) | view::take(10)) << std::endl; } ```

etam commented 7 years ago

Thanks @gnzlbg, after few more attempts I could't resist and read your solution.

I was actually pretty close. I tried to implement a view_facade that given a lambda returning a range, would execute this lambda on first access and proxy to returned range. Unfortunately there's no documentation for view_facade and I couldn't make it compile. You achieved the same result by packing a range into view::single and using view::transform for lazy evaluation, which is quite clever.

I don't think there's a possibility to avoid any_range. There must be a type-erasure somewhere in this recursion.

The issue with compilers eating all my RAM when trying to compile the non-type-erased recursion is still open, but I think it's not range-v3 fault, so I'm closing this issue.

gnzlbg commented 7 years ago

The readme contains some talks by Eric in which he describes how to implement your own view in case that you are interested. The API of view_facade has changed a bit, but with the explanations of the talks, and looking at some simpler views (e.g. like view::ints/iota) you should be able to get started writing your own.

You achieved the same result by packing a range into view::single and using view::transform for lazy evaluation, which is quite clever.

I think that part is actually an ugly hack. There must be a less hacky way of doing this than having to write view::single(rng) | view::transform(lazy) | view join... I was just too lazy (ha ha) to come up with something better @CaseyCarter @ericniebler ?

etam commented 7 years ago

Also the overall performance could be improved. For haskell program it takes about 0.03s to generate 1000 prime numbers. This implementation does it in 45 minutes!

gnzlbg commented 7 years ago

I suppose you are compiling in release mode (-Ofast -NDEBUG -march=native). Which compiler are you using? And could you maybe compile with -s and post the assembly here?

etam commented 7 years ago

The code: https://gist.github.com/etam/78f8765ad18fe25e29b402bce16ffc44 The assembly: https://gist.github.com/etam/336b2526369b70ea75105ca686ea757f compiler: g++ 6.3.1

gnzlbg commented 7 years ago

Thanks, can you also post the exact commands you used to compile?

If I had to guess, any_view is not getting devirtualized, and the compiler cannot optimize across it, so the loops are not getting optimized, and for the little amount of computation per loop iteration going on, that could explain partially the bad performance that we are seeing here.

I would like to look into it a bit more though, but I wouldn't expect the performance of an implementation that uses any_view to be good by any bar. In performance sensitive code one should avoid type erasure like the plague.

ericniebler commented 7 years ago

I haven't looked in detail, but the discussion here reminds me of my effort to write a recursively-defined view (example/recursive_range.hpp). That code suffered from exponential time behavior. Unlike Haskell, lazy list computation is not memoized, so that as the list is traversed, you get an any_view of a tail of a tail of a tail of a ... and so just incrementing an iterator becomes O(N) for the Nth element in the list. This is exponential.

It sounds like you may be running into the same problem. In Haskell, the solution is to represent lists as a head an a "thunk". Essentially, the thunk is a function that returns the tail (a head and a thunk), which then get appended to the list, building it as you go.

It would be nice if range-v3 had native support for such a reified range.