AccelerateHS / accelerate

Embedded language for high-performance array computations
https://www.acceleratehs.org
Other
893 stars 117 forks source link

divMod fails with "Prelude.Eq.== applied to EDSL types" #171

Closed amigalemming closed 9 years ago

amigalemming commented 10 years ago
*Main> AI.run (A.map (\xy -> A.lift $ divMod (A.fst xy) (A.snd xy)) (A.unit $ A.constant (10::Int, 3::Int)))
*** Exception: Prelude.Eq.== applied to EDSL types
*Main> AI.run (A.map (A.uncurry div) (A.unit $ A.constant (10::Int, 3::Int)))
Array (Z) [3]
*Main> AI.run (A.map (A.uncurry mod) (A.unit $ A.constant (10::Int, 3::Int)))
Array (Z) [1]
tmcdonell commented 10 years ago

We don't have an EDSL version of divMod at the moment, so it is picking up the default implementation, defined here. As you can see it uses (==), rather than (==*) (which we require because we can not overload boolean operations).

At the moment you'll just have to do the two separately.

/ping #10

amigalemming commented 10 years ago

The default implementation of divMod is based on quotRem, which is certainly not a good idea for accelerate. How about:

$ fgrep divMod Data/Array/Accelerate/Language.hs
  divMod  x y = (P.div x y, P.mod x y)
$ fgrep quotRem Data/Array/Accelerate/Language.hs
  quotRem x y = (P.quot x y, P.rem x y)

?

tmcdonell commented 10 years ago

Yes, that would certainly work. We should really add quotRem though, since there is a standard implementation for remquo in math.h, which I assume is faster than doing the two separately.

amigalemming commented 9 years ago

Unfortunately, the implementation is still missing in 0.15. I think that a slow implementation is better than none.

tmcdonell commented 9 years ago

Ah, this one flew under the radar and I forgot about it for 0.15.

I have added an implementation based on quot and rem. Note that the types change, so you'll need to use the version from D.A.Accelerate, not just pull it in from the type class ):

amigalemming commented 9 years ago

On Mon, 29 Sep 2014, Trevor L. McDonell wrote:

Ah, this one flew under the radar and I forgot about it for 0.15.

I have added an implementation based on quot and rem. Note that the types change, so you'll need to use the version from D.A.Accelerate, not just pull it in from the type class ):

But divMod and quotRem with result type (Exp t, Exp t) instead of (Exp (t,t)) would be possible. Could you please add according implementations to the Integral instance? Otherwise it is an unexpected trap. Currently you just get the "Prelude.Eq.==" error message and you have no idea, that 'Prelude.divMod' is the cause.

tmcdonell commented 9 years ago

Right, that's a good idea. I will add an informative error message.

A result type of (Exp t, Exp t) is possible, but this would require the quot and rem parts to always be computed separately, so there would be no gain once we move to a more efficient implementation.

tmcdonell commented 9 years ago

It now says:

*** Exception: Prelude.divMod applied to EDSL types: use Data.Array.Accelerate.divMod instead
amigalemming commented 9 years ago

On Mon, 29 Sep 2014, Trevor L. McDonell wrote:

Right, that's a good idea. I will add an informative error message.

A result type of (Exp t, Exp t) is possible, but this would require the quot and rem parts to always be computed separately, so there would be no gain once we move to a more efficient implementation.

That would still be the second-best solution, since it would prohibit to write generic algorithms that work both on plain Haskell numbers and on Accelerate expressions. I would still hope that the CUDA compiler eliminates redundant computations.

tmcdonell commented 9 years ago

I agree that this is not ideal, but that ship has already sailed because there are many functions that we can not overload such as (==), or operations such as case which we can not intercept.  I am not confident that the CUDA compiler will optimise such things. What you are emitting is this:

(_, r) = quotRem n d
(q, _) = quotRem n d

I will attempt the full yak shave and see what turns out. sigh

amigalemming commented 9 years ago

On Mon, 29 Sep 2014, Trevor L. McDonell wrote:

I agree that this is not ideal, but that ship has already sailed because there are many functions that we can not overload such as (==), or operations such as case which we can not intercept.

I am not confident that the CUDA compiler will optimise such things. What you are emitting is this:

(, r) = quotRem n d (q, ) = quotRem n d

I will attempt the full yak shave and see what turns out. sigh

I have to add, that I am using divMod not only for performance reasons but also because it warrants consistency. If I use separate 'div' and 'mod' then e.g. it is easy to miss some places for updates in a refactoring.

amigalemming commented 9 years ago

On Mon, 29 Sep 2014, Trevor L. McDonell wrote:

I agree that this is not ideal, but that ship has already sailed because there are many functions that we can not overload such as (==), or operations such as case which we can not intercept.

I am not confident that the CUDA compiler will optimise such things. What you are emitting is this:

(, r) = quotRem n d (q, ) = quotRem n d

What about

(q,r) = untup2 $ quotRem n d

? Would this be splitted as you have shown? I assumed that Accelerate observes the shared 'quotRem n d'.

tmcdonell commented 9 years ago

Mostly I want to ensure that quotRem computes only a single division. I think the un/tup2 solution should work too, which I also tried out since my last message.

Anyway, I'm just going through now trying to add the quotRem instruction directly.

tmcdonell commented 9 years ago

quotRem is sensible, but the divMod implementation is currently dumb and just calls out to div and mod separately. It would be good to find better versions of all those functions; we should write a micro benchmark to test this.