permute on tuple arrays is failing

andygill commented 10 years ago

This program gives different results for the interpreter and CUDA. The combination function is associative and commutative.

I suspect the atomic lock(s) on the tuple update is not being handle properly, and the tuples are getting out of sync.

module Main where

import Data.Array.Accelerate.CUDA as I  

import Data.Array.Accelerate hiding ((++), product, take, all, (!), fst, snd, zipWith, not, zip, or, map)
import qualified Data.Array.Accelerate as A

main = do
        let sz = 3000 :: Int
        let interm_arrA = use $ A.fromList (Z :. sz) [ Prelude.fromIntegral $ 8 - (a `mod` 17) | a <- [1..sz]] :: Acc (Array DIM1 Double)
        let msA = use $ A.fromList (Z :. sz) [ Prelude.fromIntegral $ (a `div` 8) | a <- [1..sz]] :: Acc (Array DIM1 Int)
        let inf = 10000 :: Exp Double
        let infsA = A.generate (index1 (384 :: Exp Int)) (\ _ -> lift (inf,inf))
        let inpA = A.map (\ v -> lift (abs v :: Exp Double,inf :: Exp Double)) interm_arrA

        print $ run (A.permute
                            (\ a12 b12 -> let 
                                                       (a1,a2) = unlift a12
                                                       (b1,b2) = unlift b12
                                          in (a1 <=* b1)
                                           ? ( lift (a1, min a2 b1)
                                             , lift (b1, min b2 a1)
                                             ))
                            infsA
                           (\ ix -> index1 (msA A.! ix))
                           inpA :: Acc (Array DIM1 (Double,Double)))

tmcdonell commented 10 years ago

CUDA has only compare-and-swap style atomic instructions, so we can't create a lock so that a thread can do multiple reads/writes atomically. This means that tuple components must be updated individually (since we have a struct-of-array representation in memory) and why you are seeing them fall out of sync.

Not sure if there is a way around that, but open to suggestions.

andygill commented 10 years ago

Thanks for the response. Perhaps we should restrict the types of arguments to permute to at least stop this problem hitting others. I'll rewrite my code to use a fold, which should not have the same issue.

andygill commented 10 years ago

Just FYI, fold had the same issue. So I quantized the double into 16 bits, and packed the pair into a 32-bit Word and it worked (because there is a single lock).

tmcdonell commented 10 years ago

That sounds a bit odd for fold, do you still have your test program?

Permute on 64-bit types will work as well, at least for compute 1.2 hardware and above.

AccelerateHS / accelerate

permute on tuple arrays is failing #137