Nebo15 / sage

A dependency-free tool to run distributed transactions in Elixir, inspired by Sagas pattern.
MIT License
912 stars 40 forks source link

How to handle failed retries? #35

Closed iacobson closed 6 years ago

iacobson commented 6 years ago

I try to build an example app with Sage.

I have something like this:

  def purchase() do
    Sage.new()
    |> Sage.run(:order, &order_effect/2, &order_compensation/4)
    |> ...
    |> Sage.execute(%{})
  end

  defp order_effect(_effects_so_far, _attrs) do
    Supplier.order() # let's assume this keeps retunring {:error, :no_response}
  end

  defp order_compensation(_effects_to_compensate, _effects_so_far, _stage_error, _attrs) do
    {:retry, retry_limit: 2}
  end

So, if the first stage in my saga errors, I want to retry 2 times. After 2 retries, if it keeps erroring, I want to log the error and then :abort

If I would not use the :retry I could simply do

  defp order_compensation(_effects_to_compensate, _effects_so_far, _stage_error, _attrs) do
    Logger.error("my error")
    :abort
  end

But I don't know how to handle this if I want to keep the retry option.

I was thinking to introduce some kind of catch_all stage as first stage, but not sure at all if that is the correct approach. Something like:

  def purchase() do
    Sage.new()
    |> Sage.run(:catch, &no_effect/2, &catch_compensation/4)
    |> Sage.run(:order, &order_effect/2, &order_compensation/4)
    |> ...
    |> Sage.execute(%{})
  end

  defp no_effect(_effects_so_far, _attrs) do
    {:ok, %{}}
  end

  defp catch_compensation(_effects_to_compensate, _effects_so_far, {:order, :no_response}, _attrs) do
    Logger.error("my error")
    :abort
  end

Also, as a side question, what would be the difference between returning :ok and :abort in the compensations? In my example, they behave the same so far.

Thanks so much!

AndrewDryga commented 6 years ago

Hello, first of all - thank you for trying Sage 👍.

So right now retry count is not sent to the compensation so it would be rather hard to have any logic for cases where the limit is exceeded. You might do it with a catch-all stage, but, in my project, I would just log all errors in transactions (not compensations). It sounds better because I want to know error rate for transactions anyways and want to see errors for each of retries (maybe they are different?). I guess we can extend inspecting events and add transaction return there, so it would be easier to log them.

Also, we might extend compensation function signature, but I'm not 100% sure it would be beneficial.

When compensation returns :ok it means that one of the preceding stages can start a forward recovery (retry). If you return :abort all retries would be ignored. So you should use abort when it's clear that retrying won't help with a current issue.

iacobson commented 6 years ago

Thanks so much for the very quick and detailed response. My bad that in the issue I did not mention why I want to log those errors. I wanted the error logging as a call to manual action. Eg. order retry failed 3 times, I cannot do anything, so somebody needs to take some manual action (call the provider, call the client etc etc.)

I mean, is just an example and I do not need the statistical info for now. I just have moked everything and know what responses I should expect.

So I think that for my specific case, I would go for a catch failed retries step that will match on server response errors or so.

Thanks again, and keep improving the project. I really like it!