Closed leonardb closed 3 years ago
Using spawn_monitor
will not solve the issue the change to spawn_link
was intended to fix.
I think an appropriate change for this version would be going back to the use of spawn
and handle the race condition during the checkout. I will have a look asap
@benoitc Maybe I'm misunderstanding the entire issue.
Why would it not solve the problem? The receive statement should receive any exit where the spawned process crashes as per the description of the issue in https://github.com/benoitc/hackney/issues/675
In the case of a non-normal exit this should be received by the 2nd clause in the receive and allow a clean return of an error to the caller.
The only real difference is you're not allowing a spurious exit message to propagate out of checkout
Happy to be educated here.
Went and re-read the initial issue and it seems I totally misunderstood. Specific case was request crashing, not an issue with checkout.
@leonardb correct, the spawn_link is intended to fix the case when the requester is crashing. The linked process doing the checkout is exiting when it's happening.
@leonardb #681 should fix your issue.
@ronaldwind can you also test this patch and let me know how it behaves on your side. This should also fix the issue you intended to solve using spawn_link
in #674.
1.17.3 has been released.
@benoitc Too bad, this change reintroduces the problem I had again. I've upgraded to 1.17.3 an ran the snippet from my issue at https://github.com/benoitc/hackney/issues/675, and the in_use_count
is once again shows increasing numbers.
it should reduce though. or maybe it is only a counter not reduced?
On Thu 18 Mar 2021 at 11:01, Ronald Wind @.***> wrote:
@benoitc https://github.com/benoitc Too bad, this change reintroduces the problem I had again. I've upgraded to 1.17.3 an ran the snippet from my issue at #675 https://github.com/benoitc/hackney/issues/675, and the in_use_count is once again shows increasing numbers.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/benoitc/hackney/issues/680#issuecomment-801789906, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADRIUF3WZGNIYX3NG3423TEHFQ3ANCNFSM4ZKUW3QA .
-- Sent from my Mobile
The in_use_count
is not lowering as time passes. I've waited for 10+ minutes and the count is still the same.
I've also tried filling up the pool to the point in_use_count
equals max
.
At that point no new requests are made and I simply receive a :checkout_timeout
:
iex(1) > :hackney.request(:get, "https://github.com/benoitc/hackney", [], "", [])
{:error, :checkout_timeout}
In short, it does not seem to be that the counter is just reporting wrong numbers. And it seems like it is the exact same behaviour as it was in 1.17.0.
@ronaldwind can you try this patch
diff --git src/hackney_pool.erl src/hackney_pool.erl
index c913998..ba3a739 100644
--- src/hackney_pool.erl
+++ src/hackney_pool.erl
@@ -75,8 +75,8 @@ checkout(Host, Port, Transport, Client) ->
Requester ! {checkout, Ref, Result};
false ->
case Result of
- {ok, SocketRef, Socket} ->
- checkin(SocketRef, Socket);
+ {ok, {_Name, ConnRef, Connection, Owner, Transport}, Socket} ->
+ gen_server:call(Owner, {checkin, ConnRef, Connection, Socket, Transport}, infinity);
_Error ->
ok
end
also can you send me a trace. You can do it in private on my mail if you want to
@ronaldwind i will do a tests later today, but if you have the possibility to do it , please let me know :)
This patch appears to resolve the issue.
Just for testing I altered the patch so the caller process "deadness" is guaranteed (no timing issues).
diff --git a/src/hackney_pool.erl b/src/hackney_pool.erl index c913998..e0c7101 100644 --- a/src/hackney_pool.erl +++ b/src/hackneypool.erl @@ -70,13 +70,14 @@ checkout(Host, Port, Transport, Client) -> catch :_ -> {error, checkout_failure} end,
and tested as follows:
Erlang/OTP 23 [erts-11.1.8] [source] [64-bit] [smp:6:6] [ds:6:6:10] [async-threads:1] [hipe]
Eshell V11.1.8 (abort with ^G) 1> application:ensure_all_started(hackney). {ok,[unicode_util_compat,idna,mimerl,certifi,syntax_tools, parse_trans,ssl_verify_fun,metrics,hackney]} 2> hackney_pool:start_pool(test_pool, [{checkout_timeout, 100}, {timeout, 1000}, {max_connections, 10}]). ok 3> [begin 3> P = spawn( 3> fun() -> 3> Res = case hackney:request(get, " https://github.com/benoitc/hackney", [], "", [{pool, testpool}]) of 3> {ok, , , } -> 3> ok; 3> Err -> 3> Err 3> end, 3> io:format("~p : ~p~n", [X, Res]) 3> end) 3> end || X <- lists:seq(1,20)]. [<0.429.0>,<0.430.0>,<0.431.0>,<0.432.0>,<0.433.0>, <0.434.0>,<0.435.0>,<0.436.0>,<0.437.0>,<0.438.0>,<0.439.0>, <0.440.0>,<0.441.0>,<0.442.0>,<0.443.0>,<0.444.0>,<0.445.0>, <0.446.0>,<0.447.0>,<0.448.0>] 4> timer:sleep(1000). ok 5> hackney_pool:get_stats(test_pool). [{name,test_pool}, {max,10}, {in_use_count,0}, {free_count,12}, {queue_count,0}]
Aside: I'm still not certain this is the correct way to handle the issue as is there not still a possibility the Requester dies before receiving the connection or fails to return it? I'm wondering if maybe the pool should be monitoring the Requester instead and return the connection if the Requester dies without the connection being returned. I'm also a bit confused as the code in hackney_pool seems to be indicating that the Requester is being monitored in some fashion.
On Thu, Mar 18, 2021 at 7:01 AM Benoit Chesneau @.***> wrote:
@ronaldwind https://github.com/ronaldwind can you try this ptch
diff --git src/hackney_pool.erl src/hackney_pool.erl index c913998..ba3a739 100644 --- src/hackney_pool.erl +++ src/hackney_pool.erl @@ -75,8 +75,8 @@ checkout(Host, Port, Transport, Client) -> Requester ! {checkout, Ref, Result}; false -> case Result of
- {ok, SocketRef, Socket} ->
- checkin(SocketRef, Socket);
- {ok, {_Name, ConnRef, Connection, Owner, Transport}, Socket} ->
- gen_server:call(Owner, {checkin, ConnRef, Connection, Socket, Transport}, infinity); _Error -> ok end
also can you send me a trace. You can do it in private on my mail if you want to
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/benoitc/hackney/issues/680#issuecomment-801828924, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFKWMFVHLZCAP432VHCGTDTEHMO3ANCNFSM4ZKUW3QA .
@benoitc I can also confirm the patch works fine! Thanks @leonardb for testing it too.
@benoitc Is it an oversight that hackney_pool:monitor_client/3
does not actually call erlang:monitor/2 ?
https://github.com/benoitc/hackney/blob/039db6033b9f81d75c9b0b672f9f8b96e9111575/src/hackney_pool.erl#L628-L630
Where is the monitor placed on the Requester?
thanks @ronaldwind @leonardb I will check it myself. Expect a new release later today.
@leonardb so there is new mechanism to handle the connections that will come the next major version. But right now the connection are indeed monitored via hackney_manager
. The current patch is about fixing a race condition that happen when the requester die before it's managed. All this system is overly complicated and will change soon. One of the advantage of it however is letting you pass the control of a connection to another process if needed.
@benoitc Is it an oversight that
hackney_pool:monitor_client/3
does not actually call erlang:monitor/1 ?Where is the monitor placed on the Requester?
https://github.com/benoitc/hackney/blob/master/src/hackney_connect.erl#L224
@benoitc You mean hackney_manager:update_state/1 ? That does not place any monitor. I am still confused.
I only found two places where a monitor is placed. Here: https://github.com/benoitc/hackney/blob/039db6033b9f81d75c9b0b672f9f8b96e9111575/src/hackney_stream.erl#L28
and here:
I don't use monitor but instead linking the processes ans traping exit in manager:
https://github.com/benoitc/hackney/blob/master/src/hackney_manager.erl#L570
Oh, Ic. Then in case of Requester exit the manager sends an explicit 'DOWN' message to the pool process:
https://github.com/benoitc/hackney/blob/master/src/hackney_manager.erl#L387
That is how the client removed and in_use_count
is reduced at last in hackney_pool.
Thank you @benoitc. I add this here for those wondering about the same might find it useful.
1.17.4 has been published @aboroska @ronaldwind
With the change in https://github.com/benoitc/hackney/pull/674 to spawn_link, the caller process will receive the exit message.
If the caller is trapping exits this can be quite unexpected and problematic.
This was quite a surprise when I upgraded as I was only expect exit signals from a potential crash in RMQ connections being handled in the same server. In which case the gen_server would re-initialize.
Since the caller has no access to the Pid of the spawn process it's not possible to cleanly handle this in the caller and the EXIT signal will always propagate.
A possible solution may be using a monitor instead of a link