IntelLabs / ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
BSD 2-Clause "Simplified" License
294 stars 32 forks source link

How to determine if ParallelAccelerator is working #85

Open sarkar1 opened 8 years ago

sarkar1 commented 8 years ago

I have the following code for Parzen Density estimator using Gaussian kernel. It gives me similar but different elapsed time with @acc and without. How do I determine if the ParallelAccelerator is actually helping? Running the algorithm several times and averaging out the elapsed time?

Is trace() supported by ParallelAccelerator?

using ParallelAccelerator
using Distributions

function main(n,m)
P1=0.7                  # Probability of class 1 
n1=Int(round(P1 * n))   # n1 is the number of class1 samples in the training data. 

dist=Normal(4,10) # mean 4 and variance 10
Sigma1 = eye(m,m)           # Covariance matrix for class1
Sigma2 = 2 * Sigma1         # Covariance matrix for class2

class1 = rand(dist,m,Int(n1))   # Creating synthetic data for class1
class2 = rand(dist,m,Int(n-n1)) # Creating synthetic data for class2

test_n = Int(round(n/2))        # test_n is the number of test samples
testdata = rand(dist,test_n,m)' 
@parallel for i=1:test_n
        testdata_i=reshape(testdata[:,i],m,1)
        pd1=parzen_de_gaussian(testdata_i,Sigma1,class1)[1] # Density estimation of class1
        pd2=parzen_de_gaussian(testdata_i,Sigma2,class2)[1] # Density estimation of class2
    end
end

@acc function parzen_de_gaussian(data,Sigma,traindata)
pd = exp(-0.5 * trace((data .- traindata)' * inv(Sigma) * (data .-traindata)) *2) * (1/(2 * pi * ((det(Sigma))^n)^2))
return pd
end

n=100000
m=100   
tic()
main(n,m)
print("Elapsed time..",toq())
println("...n=$n....m=$m")
DrTodd13 commented 8 years ago

Did some symbol get dropped when you posted because "2 pi" seems to be missing some operator between them and "traindata)) 2" also seems to be missing an operator?

sarkar1 commented 8 years ago

Yes you are right. This happened last time as well. I am editing the code.

sarkar1 commented 8 years ago

So when I edit my comment, I am able to see the * operator. It seems to be an editor problem. I have updated the code with spaces. It should be fine now.

DrTodd13 commented 8 years ago

One more issue is that you are using "n" as a global in parzen_de_gaussian. This stops the whole function from being exactly type inferred and so prohibits ParallelAccelerator from working. You should modify your code to pass "n" as a parameter. There are still more issues on top of this that I am investigating. Stay tuned.

DrTodd13 commented 8 years ago

det(), trace() and inv() are Julia functions that all have variables in them that type infer as Any and as such ParallelAccelerator can't do its automatic transitive optimization of such functions. We are working towards an integration with Julia threading in Julia 0.5 at which time most of these issues should go away. In the mean time, we may be able to intercept/overload such common operations as these like we do in some other cases. We'll look into it.

@ninegua What say you?

ninegua commented 8 years ago

Functions related to linear algebra may or may not be parallelizable. ParallelAccelerator at the moment has very limited supported for them in the way that it only makes a best effort trying to translate these functions from their definitions in the base library to C/C++, which often results in failure due to the limitations in code generation. So like @DrTodd13 said, this will largely become a non-issue once we have alternative backend other than C/C++.

We can also take a more incremental approach to give better support down the road since many of them are indeed parallelizable. Such a support will be similar to how we translate the common array operators and functions. We can give parallel definitions to them and export them in a sub module of ParallelAccelerator.API, so that when imported into a user program, such parallel definitions will be used, composed and optimized by ParallelAccelerator.

lkuper commented 8 years ago

BTW, in GitHub issues, if you start Julia code blocks with ``` julia and end them with `````, you won't have the formatting issues you encountered, which are caused by the Markdown parser getting confused by Julia syntax (and you'll get Julia syntax highlighting). I just edited this issue to add the correct formatting.

sarkar1 commented 8 years ago

Thank you all :)

ninegua commented 8 years ago

Commit cbee474356d4ce660da9a0465aed0360130163e7 starts the effort to add parallel implementations to base library functions.

However, functions like det and inv all involve LU factorization, and Julia's implementation of lufact is rather complicated, for good reasons. I don't have a good sense on how to give a simple parallel version of it. Alternatively we can always tune the sequential version so that they can get past CGen. Or if there is a BLAS/LAPACK equivalent we can skip the Julia's optimization/implementation and directly use that. Any suggestions? @ehsantn @DrTodd13

ehsantn commented 8 years ago

I think we can do the same think we did with GEMM; use MKL or LAPACK if available, otherwise use a naive sequential C code.

dgetrf of LAPACK and MKL can be used.