Fixed issue #643 and memory leak in hackney_pool

SergeTupchiy commented 4 years ago

Fixes: https://github.com/benoitc/hackney/issues/643

This is similar to PR https://github.com/benoitc/hackney/pull/656, but using hackney_manager:cancel_request/1 ensures that request state is also erased from process dictionary and deleted from hackney_manager_refs ETS table.

Additionally, this PR includes a memory leak fix (pending client not removed in hackney_pool:dequeue) and some code cleanup.

isaacsanders commented 4 years ago

This is awesome, you have all but one test passing.

SergeTupchiy commented 4 years ago

This is awesome, you have all but one test passing.

Looks like the failed test (hackney_integration_tests:'-test_frees_manager_ets_when_body_is_in_client/0-fun-2-'/2 ) has a non-deterministic behavior, At least two reasons for it are as follows:

it checks a shared state (hackney_manger_refs ETS table), while all the tests are running in parallel.
hackney_manager removes a record (request state) from it asynchronously, see: https://github.com/benoitc/hackney/blob/master/src/hackney_manager.erl#L87. This means that there is only a weak guarantee that AfterCount would equal BeforeCount in the below code:
```
{ok, _unusedBody} = hackney:body(Client),
AfterCount = ets:info(hackney_manager_refs, size),
...
```
I think it's worth fixing this test, but it's not directly related to the issue. Anyway, I re-trigered CI and all tests have passed. :)

isaacsanders commented 4 years ago

@benoitc @IceDragon200 I think we might have a solution here.

IceDragon200 commented 4 years ago

@isaacsanders @SergeTuchiy I guess the last test is to run a request loop and see if it still experiences checkout_timeout, I'll do that shortly and let you know how that goes

EDIT:

iex(25)> results = for _ <- 1..200, do: :hackney.request(:get, "http://localhost:6573"); Enum.uniq(results)
[error: :econnrefused]

Looks like it works as intended, so :+1:

I'll close my PR and reference this one

ijunaid8989 commented 3 years ago

do anyone else faced this issue on this PR.

[%HTTPoison.Error{id: nil, reason: :checkout_failure}]

SergeTupchiy commented 3 years ago

do anyone else faced this issue on this PR.
[%HTTPoison.Error{id: nil, reason: :checkout_failure}]
@ijunaid8989 Could you please share some details to reproduce it?

ijunaid8989 commented 3 years ago

do anyone else faced this issue on this PR.
[%HTTPoison.Error{id: nil, reason: :checkout_failure}]
@ijunaid8989 Could you please share some details to reproduce it?

I have been using this branch in latest HTTPoison , and I am sending 500 per second post requests to a third party API. my code looks like this

  def post(url, file_path, image) do
    HTTPoison.post(
      url,
      {:multipart, [{file_path, image, []}]},
      [],
      hackney: [pool: :seaweedfs_upload_pool, recv_timeout: 15_000]
    )
  end

and I have these pool settings

      :hackney_pool.child_spec(:snapshot_pool, [timeout: 50_000, max_connections: 10_000]),
      :hackney_pool.child_spec(:seaweedfs_upload_pool, [timeout: 50_000, max_connections: 10_000]),
      :hackney_pool.child_spec(:seaweedfs_download_pool, [timeout: 50_000, max_connections: 10_000]),

in my application tree.

SergeTupchiy commented 3 years ago

@ijunaid8989 Are you using https? It's hard to guess what exactly went wrong under your conditions, but I tried to make many concurrent https multipart requests using your pool settings and got checkout_failure, caused by:

exit, {timeout,
                            {gen_server,call,
                                [application_controller,
                                 {set_env,crypto,'$curves$',
                                      ...

My code snippet:

1> application:ensure_all_started(hackney).                                                                                                                                              {ok,[crypto,asn1,public_key,ssl,unicode_util_compat,idna,                                      
     mimerl,certifi,syntax_tools,parse_trans,ssl_verify_fun,
     metrics,hackney]}
2> hackney_pool:start_pool(multipart, [{max_connections, 10_000}, {timeout, 50_000}]).
ok
3> [spawn(fun() -> case hackney:post("https://example.com", [], {multipart, [{file, <<"/tmp/img.jpg">>, <<"image">>, []}]}, [with_body, {pool, multipart}, {recv_timeout, 5000}]) of {error, Reason} -> io:format("error: ~p ~n", [Reason]); _ -> ok end end) || _ <- lists:seq(1,300)].

This not introduced by this PR, as it's also reproducible on master. If you reason is the same, I would recommend calling crypto:supports() before making any https requests. This will 'warm up' crypto by pre-populating its app config.

ijunaid8989 commented 3 years ago

@SergeTupchiy Actually we are not making any HTTPS request,

the origin of the request is HTTPS but the remote URL to which request is going is not HTTPs. do you still suggest the same?

SergeTupchiy commented 3 years ago

@ijunaid8989, This is the place where it occurs: https://github.com/SergeTupchiy/hackney/blob/master/src/hackney_pool.erl#L74

You can add some logging there and share the output. It would help to locate the issue. For example:

try
  do_checkout(Requester, Host, Port, Transport, Client, ConnectTimeout, CheckoutTimeout) 
catch Err:Res:St ->
  io:format("checkout failure: ~p, ~p, ~p~n", [Err, Res, St]),
  {error, checkout_failure}
end

ijunaid8989 commented 3 years ago

@SergeTupchiy we are actually trapped with many errors. checkout timeout, checkout failure, closed.

https://github.com/edgurgel/httpoison/issues/326

SergeTupchiy commented 3 years ago

@ijunaid8989, There is no surprise that you (occasionally?) get all the errors you mentioned under some still not completely clear conditions. This PR is intended to fix a rather clear and defined problem, see https://github.com/benoitc/hackney/issues/643. I'd be happy to at least try to fix the issues you face but it would be too time consuming (if feasible at all) for me, considering the details you shared.
Saying that one experiences checkout_timeout or closed doesn't really helps, since those errors are natural to happen in some cases . I have no idea how to reproduce the issue you linked. It's have been open for two years and I guess I know why it's still not resolved: someone says that one http client works fine with service B but fails with server A, while another works with both. It doesn't look like an exhaustive description.

benoitc commented 3 years ago

sorry for the late reply. I have been quite busy these days.

Thanks for the patch! Looking at it.

isaacsanders commented 3 years ago

@benoitc Running into this issue currently. Any status update on your end?

lasernite commented 3 years ago

@benoitc Also getting bit hard by this, merging would be much appreciated so we don't have to further litter our codebase with catches and hard resets of the pool everytime we make a request with HTTPoison that may timeout. Thank you!

ijunaid8989 commented 3 years ago

@lasernite how do you hard reset your pool?

lasernite commented 3 years ago

@ijunaid8989 we're doing stuff like:

{:error, %HTTPoison.Error{reason: :checkout_timeout}} ->
    :hackney_pool.stop_pool(:default)

ijunaid8989 commented 3 years ago

@ijunaid8989 we're doing stuff like:

{:error, %HTTPoison.Error{reason: :checkout_timeout}} ->
    :hackney_pool.stop_pool(:default)

is it helpful? we also getting timeout/checkout_timeout/checkout_failure/closed?

Would that be helpful in such condition?

lasernite commented 3 years ago

@ijunaid8989 Yeh once we catch the timeouts and reset the pool the error goes away—although I assume this has some negative performance implications as unnecessarily clearing the pool frequently, but nothing noticeable so far.

ijunaid8989 commented 3 years ago

@ijunaid8989 Yeh once we catch the timeouts and reset the pool the error goes away—although I assume this has some negative performance implications as unnecessarily clearing the pool frequently, but nothing noticeable so far.

Okay thanks.

FYI: we have started hackney pools such as

      :hackney_pool.child_spec(:snapshot_pool, [timeout: 50_000, max_connections: 10_000]),
      :hackney_pool.child_spec(:seaweedfs_upload_pool, [timeout: 50_000, max_connections: 10_000]),
      :hackney_pool.child_spec(:seaweedfs_download_pool, [timeout: 50_000, max_connections: 10_000]),

in application.ex file.

lasernite commented 3 years ago

@ijunaid8989 Assuming those child_specs are getting started in a supervision tree in your application, I'm guessing you have to stop the pools by the atoms you customized above (:snapshot_pool, etc.) instead of :default when calling stop_pool. But don't take my word for it—I'm just using a library around hackney (HTTPoison) instead of accessing the methods directly like this, so I'll leave it up to you to see what works!

ijunaid8989 commented 3 years ago

@ijunaid8989 Assuming those child_specs are getting started in a supervision tree in your application, I'm guessing you have to stop the pools by the atoms you customized above (:snapshot_pool, etc.) instead of :default when calling stop_pool. But don't take my word for it—I'm just using a library around hackney (HTTPoison) instead of accessing the methods directly like this, so I'll leave it up to you to see what works!

Thanks, I am also using HTTPoison, but this is how we are starting the pools.

can you tell me how you have started the pools? I mean in the config file? or with each HTTP request?

Maybe your way of using pools with HTTPoison is different (correct).?

lasernite commented 3 years ago

@ijunaid8989 We're not setting up any custom pools. As HTTPoison is a dependency in mix.exs when we start the server the normal application lifecycle methods are hit to automatically start HTTPoison and add to the supervision tree. Pooling reduces latency on subsequent requests from the same host, but I haven't investigated whether our patch is stopping pooling altogether or whether when the pools get stopped the supervision tree is automatically restarting them again. Doesn't matter except for saving a few hundred milliseconds on subsequent requests in the best case, which if we are losing will be fixed when this library is updated and we remove the temporary patch.

SergeTupchiy commented 3 years ago

SergeTupchiy added 2 commits on Oct 13 @SergeTupchiy Fix a memory leak in hackney_pool c9979cf @SergeTupchiy Fix checkout_timeout error in hackney_pool caused by connection errors 2bf38f9

This is just to reformat my commit messages :smile:

ijunaid8989 commented 3 years ago

@SergeTupchiy You got me, I was thinking its a fix for something!

benoitc commented 3 years ago

late update but testing the patch right now. thanks for the refactoring

ijunaid8989 commented 3 years ago

late update but testing the patch right now.

It would be great to see some progress on this. I had to clone HTTPoison as well as Hackney to use my own sources.

benoitc commented 3 years ago

late update but testing the patch right now.

It would be great to see some progress on this. I had to clone HTTPoison as well as Hackney to use my own sources.

this release is beeing tested. It will be normally shipped later today.

benoitc commented 3 years ago

@SergeTupchiy thanks for the patch. Just merged your changes. Will finish the testing later today of the release today. Normally a release should land today as well.

RaphSfeir commented 3 years ago

Is this now part of the 1.16.0 release? I'm also using my own sources and would like to switch back to official release but I keep getting the same issue. Thanks!

benoitc commented 3 years ago

this is part of the coming release yes. I will take care of it this morning . Have been sidetracked until now

On Thu 26 Nov 2020 at 20:33 Raphaël Sfeir notifications@github.com wrote:

Is this now part of the 1.16.0 release? I'm also using my own sources and would like to switch back to official release but I keep getting the same issue. Thanks!

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/benoitc/hackney/pull/661#issuecomment-734453838, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADRIVEOVWZ6NYYEWRMXHTSR2UPHANCNFSM4SPUB3IQ .

-- Sent from my Mobile

benoitc / hackney

Fixed issue #643 and memory leak in hackney_pool #661