JulesKouatchou / basic_language_comparison

Other
86 stars 22 forks source link

avoid unrelated comparisons #19

Open thautwarm opened 2 years ago

thautwarm commented 2 years ago

It is really interesting to see that Numba gets faster than Julia. However, during my repro, I found some benchmarks in this repo quite unrelated to the conclusions.

TL;DR;: After fixing the benchmark, the examination shows that Julia is >10x faster than Numba for the case evaluate_functions.

Consider the case evaluate_functions, numba uses parallel for loop. It seems to produce faster programs than other languages, however it is quite misunderstanding.

Programs using prange gives different results from that use range, while the latter is the behaviour of the corresponding Julia program.

In [17]: @njit(parallel=True)
    ...: def evaluate_functions(n):
    ...:     """
    ...:         Evaluate the trigononmetric functions for n values evenly
    ...:         spaced over the interval [-1500.00, 1500.00]
    ...:     """
    ...:     vector1 = np.linspace(-1500.00, 1500.0, n)
    ...:     iterations = 10000
    ...:     for i in range(iterations):
    ...:         vector2 = np.sin(vector1)
    ...:         vector1 = np.arcsin(vector2)
    ...:         vector2 = np.cos(vector1)
    ...:         vector1 = np.arccos(vector2)
    ...:         vector2 = np.tan(vector1)
    ...:         vector1 = np.arctan(vector2)
    ...:     return vector1
    ...:

In [18]: evaluate_functions(10)
Out[18]:
array([1.46030424, 1.13579218, 0.81128013, 0.48676808, 0.16225603,
       0.16225603, 0.48676808, 0.81128013, 1.13579218, 1.46030424])

In [19]: @njit(parallel=True)
    ...: def evaluate_functions(n):
    ...:     """
    ...:         Evaluate the trigononmetric functions for n values evenly
    ...:         spaced over the interval [-1500.00, 1500.00]
    ...:     """
    ...:     vector1 = np.linspace(-1500.00, 1500.0, n)
    ...:     iterations = 10000
    ...:     for i in prange(iterations):
    ...:         vector2 = np.sin(vector1)
    ...:         vector1 = np.arcsin(vector2)
    ...:         vector2 = np.cos(vector1)
    ...:         vector1 = np.arccos(vector2)
    ...:         vector2 = np.tan(vector1)
    ...:         vector1 = np.arctan(vector2)
    ...:     return vector1
    ...:

In [20]: evaluate_functions(10)
Out[20]:
array([-1500.        , -1166.66666667,  -833.33333333,  -500.        ,
        -166.66666667,   166.66666667,   500.        ,   833.33333333,
        1166.66666667,  1500.        ])

The Julia behaviour:

function evaluatefunctions(N)
           #x = linspace(-1500.0, 1500.0, N)
           x = collect(range(-1500.0, stop=1500.0, length=N))
           M = 10000
           for i in 1:M
               y = sin.(x)
               x = asin.(y)
               y = cos.(x)
               x = acos.(y)
               y = tan.(x)
               x = atan.(y)
           end
           return x
 end

julia> evaluatefunctions(N)
10-element Vector{Float64}:
 1.4603042376686257
 1.135792184853451
 0.8112801320381631
 0.48676807922287524
 0.16225602640761583
 0.16225602640761583
 0.48676807922287524
 0.8112801320381631
 1.135792184853451
 1.4603042376686257

Fixing the incorrect use of prange gives a fair result:

In [16]: %timeit evaluate_functions(10)
55.4 ms ± 2.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
julia> N = 10
10

julia> @btime evaluatefunctions(N)
  4.224 ms (60001 allocations: 8.24 MiB)