Closed amigalemming closed 9 years ago
We don't have an EDSL version of divMod
at the moment, so it is picking up the default implementation, defined here. As you can see it uses (==)
, rather than (==*)
(which we require because we can not overload boolean operations).
At the moment you'll just have to do the two separately.
/ping #10
The default implementation of divMod is based on quotRem, which is certainly not a good idea for accelerate. How about:
$ fgrep divMod Data/Array/Accelerate/Language.hs
divMod x y = (P.div x y, P.mod x y)
$ fgrep quotRem Data/Array/Accelerate/Language.hs
quotRem x y = (P.quot x y, P.rem x y)
?
Yes, that would certainly work. We should really add quotRem
though, since there is a standard implementation for remquo
in math.h, which I assume is faster than doing the two separately.
Unfortunately, the implementation is still missing in 0.15. I think that a slow implementation is better than none.
Ah, this one flew under the radar and I forgot about it for 0.15.
I have added an implementation based on quot
and rem
. Note that the types change, so you'll need to use the version from D.A.Accelerate, not just pull it in from the type class ):
On Mon, 29 Sep 2014, Trevor L. McDonell wrote:
Ah, this one flew under the radar and I forgot about it for 0.15.
I have added an implementation based on quot and rem. Note that the types change, so you'll need to use the version from D.A.Accelerate, not just pull it in from the type class ):
But divMod and quotRem with result type (Exp t, Exp t) instead of (Exp (t,t)) would be possible. Could you please add according implementations to the Integral instance? Otherwise it is an unexpected trap. Currently you just get the "Prelude.Eq.==" error message and you have no idea, that 'Prelude.divMod' is the cause.
Right, that's a good idea. I will add an informative error message.
A result type of (Exp t, Exp t)
is possible, but this would require the quot
and rem
parts to always be computed separately, so there would be no gain once we move to a more efficient implementation.
It now says:
*** Exception: Prelude.divMod applied to EDSL types: use Data.Array.Accelerate.divMod instead
On Mon, 29 Sep 2014, Trevor L. McDonell wrote:
Right, that's a good idea. I will add an informative error message.
A result type of (Exp t, Exp t) is possible, but this would require the quot and rem parts to always be computed separately, so there would be no gain once we move to a more efficient implementation.
That would still be the second-best solution, since it would prohibit to write generic algorithms that work both on plain Haskell numbers and on Accelerate expressions. I would still hope that the CUDA compiler eliminates redundant computations.
I agree that this is not ideal, but that ship has already sailed because there are many functions that we can not overload such as (==)
, or operations such as case
which we can not intercept.
I am not confident that the CUDA compiler will optimise such things. What you are emitting is this:
(_, r) = quotRem n d
(q, _) = quotRem n d
I will attempt the full yak shave and see what turns out. sigh
On Mon, 29 Sep 2014, Trevor L. McDonell wrote:
I agree that this is not ideal, but that ship has already sailed because there are many functions that we can not overload such as (==), or operations such as case which we can not intercept.
I am not confident that the CUDA compiler will optimise such things. What you are emitting is this:
(, r) = quotRem n d (q, ) = quotRem n d
I will attempt the full yak shave and see what turns out. sigh
I have to add, that I am using divMod not only for performance reasons but also because it warrants consistency. If I use separate 'div' and 'mod' then e.g. it is easy to miss some places for updates in a refactoring.
On Mon, 29 Sep 2014, Trevor L. McDonell wrote:
I agree that this is not ideal, but that ship has already sailed because there are many functions that we can not overload such as (==), or operations such as case which we can not intercept.
I am not confident that the CUDA compiler will optimise such things. What you are emitting is this:
(, r) = quotRem n d (q, ) = quotRem n d
What about
(q,r) = untup2 $ quotRem n d
? Would this be splitted as you have shown? I assumed that Accelerate observes the shared 'quotRem n d'.
Mostly I want to ensure that quotRem
computes only a single division. I think the un/tup2
solution should work too, which I also tried out since my last message.
Anyway, I'm just going through now trying to add the quotRem instruction directly.
quotRem
is sensible, but the divMod
implementation is currently dumb and just calls out to div
and mod
separately. It would be good to find better versions of all those functions; we should write a micro benchmark to test this.