Concurrency related question

IhorTsoi commented 3 years ago

Hi 😊

Thank you very much for a great introduction to Elixir!

I got a little bit stuck at "Concurrency" section. I would be more than happy if you could explain me some details.

The Factorial.spawn function is implemented like this:

def spawn(n) do
  1..n
  |> Enum.chunk_every(4)
  |> Enum.map(fn(list) ->
    spawn(Factorial, :_spawn_function, [list])
    |> send(self())

    receive do
      n -> n
    end
  end)
  |> calc_product()
end

Each chunk is being processed by Enum.map function. The processing for each chunk goes like this: 1) create a new process (+ pass a function and the current chunk to be processed) 2) send a message with the main process's PID to newly created process (newly created process receives the message and starts calculations) 3) wait until the process responds with a calculated result for the current chunk.

As far as I understand, iterations of Enum.map are executed sequentially. On each iteration we spawn a process and then wait until it responds with a calculated result. Thus, it looks like, even though we utilize processes, the calculations are still done sequentially.

Initially, I thought that this modified implementation would execute faster:

def spawn_n_go(n) do
  1..n
  |> Enum.chunk_every(4)
  |> Enum.map(fn(list) -> # create all the processes and start the calculations
    spawn(Factorial, :_spawn_function, [list])
    |> send(self()) 
  end)
  |> Enum.map(fn (_) -> # wait for each process to finish calculations and send the response
    receive do
      n -> n
    end
  end)
  |> calc_product()
end

, because firstly it starts all the calculations and secondly it waits for all of them to finish (not one-by-one).

However, what surprised me, the execution of both versions on my machine took almost the same time. The initial version even executed a little bit faster. I tried experimenting with the input values, but the results were still the same.

This is where I got totally confused 😢

My only guess is perhaps BEAM is able to optimize the execution of Enum.map and start the next iteration while it waits for the result from receive macro... But that would be documented in the docs I guess. I haven't found anything like this.

Could you please provide me with some information on how this is possible (perhaps some docs or articles), so as I could understand this part better? If I am wrong, please correct me.

Maybe this information would be also useful for other readers.

Thank you!

nelsonic commented 3 years ago

@IhorTsoi how many cores does your computer have? 💭 There’s a startup and coordination cost to parallelisation which is why most programming languages don’t bother with it. In the case you’ve shared you might not see the benefit because the calculation is very fast. This is more of a software engineering question than an Elixir-specific one.

Recommend reading: http://www.gotw.ca/publications/concurrency-ddj.htm

IhorTsoi commented 3 years ago

Hi @nelsonic!

Thank you very much for your response 😃

I have 2 physical + 2 virtual cores.

I should have mentioned another fact in my initial post: the sequential implementation from the example ran 3 times slower than the concurrent implementation from the example.

I'll paste the code from the example here for your convenience:

defmodule Factorial do
  def spawn(n) do
    1..n
    |> Enum.chunk_every(4)
    |> Enum.map(fn(list) ->
      spawn(Factorial, :_spawn_function, [list])
      |> send(self())

      receive do
        n -> n
      end
    end)
    |> calc_product()
  end

  def _spawn_function(list) do
    receive do
      sender ->
        product = calc_product(list)
        send(sender, product)
    end
  end

  # used on the single process
  def calc_product(n) when is_integer(n) do
    Enum.reduce(1..n, 1, &(&1 * &2))
  end

  # used with multiple processes
  def calc_product(list) do
    Enum.reduce(list, 1, &(&1 * &2))
  end

    # just a helper function used to time the other functions
    def run(f_name, args) do
      :timer.tc(Factorial, f_name, args)
      |> elem(0) # only displays the time as I didn't want to log numbers that could have thousands of digits
      |> IO.inspect(label: "----->")
    end
end

So:

The sequential implementation just multiplies the elements of the range in reduce function. One-by-one.
The concurrent implementation splits the same range into chunks, then for each chunk: (1) creates a new process (starts it by sending a message), (2) waits until the process responds with a calculated result for the current chunk (the multiplication is also done one-by-one). Even though the concurrent implementation utilizes processes, it only runs one worker process at any point in time (in each iteration of Enum.map it starts a process and waits for it to finish, then the next iteration and so on...). So it looks like the calculations are still done sequentially. (please, correct me if I'm wrong here).

From my point of view, even if each chunk could be processed before we hit the receive macro, so we won't wait at any point of execution, we would still spend some additional time for spawning a process and sending/receiving the messages.

My question is: how can this concurrent implementation still be faster? (3 times faster (!) on my machine) It seems like we can just introduce processes and automatically gain a speed-up😃

IhorTsoi commented 3 years ago

P.S. I know that some compilers are smart enough to automatically detect pieces of code that can run in parallel. However, I think that the current example isn't the case. All the response messages should be received only by the receive macro in the Enum.map iteration that sent the according request message. Despite the fact that for particular case the one-to-one relationship between the iterations and response messages is not important (we get to sum them anyway, the order is not important), I think that the compiler wouldn't be brave enough to perform such optimization.

kbr- commented 3 months ago

The issue is still the fact that the tutorial is incorrect/misleading in this part. This is a section titled "Concurrency" but the processes (one for each chunk) are not running concurrently. Each is spawned and finishes its job before the next one is spawned, as @IhorTsoi observed, because receive blocks the Enum.map from progressing to the next item.

The tutorial should IMO be fixed to use the modified implementation proposed by @IhorTsoi

dwyl / learn-elixir

Concurrency related question #165