IntelLabs / ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
BSD 2-Clause "Simplified" License
294 stars 32 forks source link

pi example - no convergence to \pi #65

Closed mgr327 closed 8 years ago

mgr327 commented 8 years ago

Hello,

the following graph shows the standard deviation of the result vs the number of Monte Carlo steps. It seems that there is a problem with the convergence to the proper value of pi in the 'parallelized' code (green graph). The serial code convergence (blue graph) is OK. The red line is just 1/\sqrt{n} that is provided as a guide for the eye. mc-pi-convergence

The calculation were done using the code from the pi example:

@acc function calcPi(n::Int64)
    x = rand(n) .* 2.0 .- 1.0
    y = rand(n) .* 2.0 .- 1.0
    return 4.0*sum(x.^2 .+ y.^2 .< 1.0)/n
end

function calcPi0(n::Int64)
    x = rand(n) .* 2.0 .- 1.0
    y = rand(n) .* 2.0 .- 1.0
    return 4.0*sum(x.^2 .+ y.^2 .< 1.0)/n
end

The version of Julia and the packages were as following:

julia> versioninfo() Julia Version 0.4.3 Commit a2f713d* (2016-01-12 21:37 UTC) Platform Info: System: Linux (x86_64-redhat-linux) CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz WORD_SIZE: 64 BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge) LAPACK: libopenblasp.so.0 LIBM: libopenlibm LLVM: libLLVM-3.3

julia> Pkg.status("ParallelAccelerator")

julia> Pkg.status("CompilerTools")

The actual code used to produce the graph is as following:

using ParallelAccelerator

if Pkg.installed("PyPlot") != nothing
    using PyPlot
end

@acc function calcPi(n::Int64)
    x = rand(n) .* 2.0 .- 1.0
    y = rand(n) .* 2.0 .- 1.0
    return 4.0*sum(x.^2 .+ y.^2 .< 1.0)/n
end

function calcPi0(n::Int64)
    x = rand(n) .* 2.0 .- 1.0
    y = rand(n) .* 2.0 .- 1.0
    return 4.0*sum(x.^2 .+ y.^2 .< 1.0)/n
end

function convergence_test(;nd::Int64=10, nrep::Int64=16, nmin::Int64=1536, 
                                            seed::Int64=1, sc::Float64=2.)

    errors = Vector{Float64}(nd)
    errors_p = Vector{Float64}(nd)
    nps = Vector{Int64}(nd)

    srand(seed)

    np = nmin
    for i in 1:nd
        err = 0.
        err_p = 0.
        for j in 1:nrep
            pic = calcPi0(np)
            err += (pi - pic)^2
            pic_p = calcPi(np)
            err_p += (pi - pic_p)^2
        end
        errors[i] = sqrt(err/nrep)
        errors_p[i] = sqrt(err_p/nrep)
        nps[i] = np
        np = round(Int, sc*np)
    end

    return nps, errors, errors_p
end

function plot_errors(nps::Vector{Int64}, 
        errors::Vector{Float64}, errors_p::Vector{Float64}; 
        file::ASCIIString="mc-pi-convergence.png")
    nd = length(nps)
    guide = Vector{Float64}(nd)
    for i in 1:nd
        guide[i] = 0.7/sqrt(nps[i])
    end

    loglog(nps, errors, basex=2, label="serial code", marker="o")
    loglog(nps, errors_p, basex=2, label="ParallelAccelerator", marker="o")
    loglog(nps, guide, basex=2, label="1/sqrt(x)")
    legend(loc="upper right", fancybox="true")
    grid("on")
    xlabel(L"$n$")
    ylabel(L"$\sqrt{\langle(\pi - p_n)^2\rangle}$")
    if !isinteractive()
        savefig(file)
    end
end

if !isinteractive()
    nps, errs, errs_p = convergence_test(sc=sqrt(2.), nd=35)
    if Pkg.installed("PyPlot") != nothing
        plot_errors(nps, errs, errs_p)
    end
end
ehsantn commented 8 years ago

Thank you for reporting the issue. I think the problem is with parallel initialization of the random number generator. The generated C++ code runs fine without OpenMP: mc-pi-convergence I will work on fixing the issue. Side note: I wonder how far HPAT.jl can go.

ehsantn commented 8 years ago

The issue is fixed with 8e0485a0db2d8a36237b35ecbe63a28aadea8ec6. Now different threads use different RNG engines with random seeds, which gives approximately independent streams. Here is the graph with ParallelAccelerator (with OpenMP):

mc-pi-convergence