elixir-lang / elixir

Elixir is a dynamic, functional language for building scalable and maintainable applications
https://elixir-lang.org/
Apache License 2.0
24.55k stars 3.38k forks source link

Comprehension into empty unbound bitstring has performance bug #5489

Closed umamaistempo closed 7 years ago

umamaistempo commented 7 years ago

Environment

Elixir 1.3.4 | Erlang/OTP 19

Current behavior

When you pass the param into: "" to a comprehension, the length of the strings that are joined affects the time it takes to execute the block

Expected behavior

No significant difference

Example code

import ExUnit.Assertions

ex1 = List.duplicate("a", 1_000)
ex2 = "a" |> String.duplicate(1_000) |> List.duplicate(1_000)
# Alternatively, try:
# ex2 = "a" |> String.duplicate(1_000_000) |> List.duplicate(10)
str = ""

{t1a, v1a} = :timer.tc fn -> for i <- ex1, into: str, do: i end
{t2a, v2a} = :timer.tc fn -> for i <- ex1, into: "xyz", do: i end 
{t3a, v3a} = :timer.tc fn -> for i <- ex1, into: "", do: i end

{t1b, v1b} = :timer.tc fn -> for i <- ex2, into: str, do: i end
{t2b, v2b} = :timer.tc fn -> for i <- ex2, into: "xyz", do: i end
{t3b, v3b} = :timer.tc fn -> for i <- ex2, into: "", do: i end

assert "xyz" <> v1a == v2a
assert v1a == v3a

assert "xyz" <> v1b == v2b
assert v1b == v3b

IO.puts """
Comprehension into bound string             => #{t1a} ~~~> #{t1b}
Comprehension into unbound non-empty string => #{t2a} ~~~> #{t2b}
Comprehension into unbound empty string     => #{t3a} ~~~> #{t3b}
"""

Results on my machine:

Comprehension into bound string             => 3161 ~~~> 3628
Comprehension into unbound non-empty string => 3632 ~~~> 4057
Comprehension into unbound empty string     => 3368 ~~~> 479934

As you can see, the only difference between the three different comprehensions is the into part. When using a non-empty bitstring or a bound string, the result, as expected, doesn't suffer of a considerable performance hit; in the case of an unbound empty bitstring, there is a crescent cost based on the length of the input.

Note that, as shown above, this strange behaviour only happens when you meet the following criteria: 1) The comprehension has the into: "" param 2) The comprehension traverses a very large list or a list of long strings

josevalim commented 7 years ago

How are you running your benchmarks? Have you put the code snippet above in a file? Or are you running it in IEx?

umamaistempo commented 7 years ago

I was running directly on IEX. Used only timer:tc/1 to "benchmark"

whatyouhide commented 7 years ago

@mememori definitely try to benchmark this from a compiled module and not from IEx, as benchmarks from IEx are usually not reliable.

umamaistempo commented 7 years ago

Ok. I'll do it later and update this issue :grin:

josevalim commented 7 years ago

Yes, the code on IEx is evaluated so it may suffer very high fluctuations. In particular, I could not reproduce such a drastic difference here (there is a difference but a certain difference is expected). Thank you!

umamaistempo commented 7 years ago

Just commenting here to confirm this was a false flag, when the code is compiled from a file, the behaviour is not triggered in my setup.

Thank you for your help, josevalim and whatyouhide :grin: