benoitc / couchbeam

Apache CouchDB client in Erlang
Other
242 stars 113 forks source link

couchbeam_view:fetch/3 occasionally hangs? #100

Closed mikebeam closed 9 years ago

mikebeam commented 10 years ago

Hi Benoit,

I've run into an issue with the recent versions of couchbeam (first experienced the issue in 0.9.3, and I'm continuing to see it in 1.0.3) where couchbeam_view:fetch/3 will occasionally hang.

Sometimes when it hangs the following crash report is generated:

2014-01-06 12:29:00.772 [error] <0.1769.0> CRASH REPORT Process <0.1769.0> with 0 neighbours crashed with reason: {timeout,[{couchbeam_view_stream,do_init_stream,2,[{file,"src/couchbeam_view_stream.erl"},{line,116}]},{couchbeam_view_stream,init_stream,5,[{file,"src/couchbeam_view_stream.erl"},{line,72}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}

Here is some code that exhibits the problem:

t() ->
    {ok, DB} = couchbeam:open_db( couchbeam:server_connection( "http://localhost:5984" ), "testdb" ),
    lists:foreach(
        fun(I) ->
            io:format( "~w~n", [I]),
            couchbeam_view:fetch( DB, {"test", "test"}, [{key, "test_key"}])
        end,
        lists:seq( 0, 1000 )
    ).

Where http://localhost:5984/testdb/_design/test contains:

{
   "_id": "_design/test",
   "language": "javascript",
   "views": {
       "test": {
           "map": "function(doc) { emit( doc._id, 1 ); }"
       }
   }
}

I've tested against CouchDB versions 1.1.0, 1.2.0, and 1.5.0, all with similar results.

Any ideas?

Thanks, Mike Beam

benoitc commented 10 years ago

hrm this error happen if couchbeam didn't received a response in 10 secs when requesting the view. Are you seeing anything in the couchdb logs?

mikebeam commented 10 years ago

No, nothing unusual when running at a debug log level. For example, the logging for the request that hangs in a recent test:

[Tue, 07 Jan 2014 03:35:24 GMT] [debug] [<0.212.10>] 'GET' /test_db/_design/test/_view/test?key=6915 {1,1} from "127.0.0.1"
Headers: [{'Host',"localhost:5984"},{'User-Agent',"hackney/0.10.1"}]
[Tue, 07 Jan 2014 03:35:24 GMT] [debug] [<0.212.10>] OAuth Params: [{"key","6915"}]
[Tue, 07 Jan 2014 03:35:24 GMT] [debug] [<0.212.10>] request_group {Pid, Seq} {<0.17796.5>,6}
[Tue, 07 Jan 2014 03:35:24 GMT] [info] [<0.212.10>] 127.0.0.1 - - 'GET' /test_db/_design/test/_view/test?key=6915 200
benoitc commented 10 years ago

mmm I will test this morning. One thing that come to my mind is that couchdb is too slow to release a connectin and one of your connection is running out of to checkout a socket from the pool. Can you try to in crease the pool size using the max_connections connection options. Default is 25?

mikebeam commented 10 years ago

max_connections was unset in my CouchDB configuration, so I could not verify the default. Setting it to 2048 did not appear to have an impact on the problem. I will note that fetching the view in the loop was just my way of making the problem repeatable. My application invokes the view fetch in response to certain events that occur at irregular periods, so it may take half a day or several days for this problem to show itself in production.

mikebeam commented 10 years ago

Also, the crash report is not always generated; usually the test case just hangs without a timeout.

benoitc commented 10 years ago

I meant max_connections in couchbeam. I will check with the test you provided

On Tue, Jan 7, 2014 at 4:16 PM, Michael Beam notifications@github.comwrote:

max_connections was unset in my CouchDB configuration, so I could not verify the default. Setting it to 2048 did not appear to have an impact on the problem. I will note that fetching the view in the loop was just my way of making the problem repeatable. My application invokes the view fetch in response to certain events that occur at irregular periods, so it may take half a day or several days for this problem to show itself in production.

— Reply to this email directly or view it on GitHubhttps://github.com/benoitc/couchbeam/issues/100#issuecomment-31745504 .

mikebeam commented 10 years ago

Ah, ok, I will try that.

mikebeam commented 10 years ago

I added hackney_pool:set_max_connections( default, 1000 ) to the top of the test case with no improvement.

behrad commented 10 years ago

@benoitc I have a to-be-seen similar problem, So I didn't open another issue. my view requests get {error,timeout} and the couchbeam gen_process complains about an error, on heavy traffic. These timeouts causes my program to hang after a few minutes. My case is, sending about >= 600 concurrent connections (Heads+Updates+View calls) to CouchDB every 5secs. I firstly was trying erlang's built-in httpc which randomly raised frequently some eaddrnotavail errors, which made me to use Couchbeam. Thanks Couchbeam & Hackneys better socket handling, my head and puts were now perfectly working, but my view requests got {error,timeout} (Couchdb's log is always clean in any scenario!), couchbeam gen_server's error was something like connect_failed, eaddrnotavail as I remember. (I'll have access to server logs in a few days)

Then I replaced couchbeam_view:fetch with my old httpc:request calls, and this hybrid is now working with a very lower eaddrnotavail errors still happening! I also played with max_connections pool size changing it to 2000, but no success! Any ideas?

behrad commented 10 years ago

https://github.com/benoitc/couchbeam/issues/102

benoitc commented 10 years ago

what is your limit of fds ? ulimit -n ? I couldn't repreoduce it there.

behrad commented 10 years ago

respectively large for my example! (=100000)

mikebeam commented 10 years ago

16384 in my case.

benoitc commented 10 years ago

How many doc do you have in a db? Maybe you could put somewhere the .couch so I can try to reproduce it. Also which version of couch?

mikebeam commented 10 years ago

Just the design document in this test case. My recent tests have been against CouchDB v1.1.0, but I've observed it against 1.2.0 and 1.5.0.

FYI, it seems to be hanging in hackney_stream:maybe_continue/4, the async=once variant. Printing out the Client parameter shows the following just prior to the hang and hibernation in the after statement:

{client,hackney_tcp_transport,"localhost",5984,netloc,
        [{async,once}],
        #Port<0.19365>,
        {default,{"localhost",5984,hackney_tcp_transport},
                 <0.311.0>,hackney_tcp_transport},
        #Ref<0.0.0.167446>,true,hackney_pool,infinity,false,5,false,nil,
        undefined,
        {hparser,response,4096,10,0,on_header,
                 <<"Server: CouchDB/1.1.0 (Erlang OTP/R15B)\r\nDate: Wed, 22 Jan 2014 21:37:20 GMT\r\nContent-Type: tex
t/plain;charset=utf-8\r\nContent-Length: 38\r\nCache-Control: must-revalidate\r\n\r\n{\"total_rows\":2,\"offset\":0,\"r
ows\":[]}\n">>,
                 {1,1},
                 undefined,[],undefined,undefined,undefined,undefined,
                 undefined,waiting},
        connected,on_header,nil,normal,false,once,false,
        #Fun<hackney_request.send.2>,waiting,nil,4096,<<>>,[],
        {1,1},
        nil,nil,nil,<<"GET">>,nil}

In three tests the hang occurs when the hparser tuple contains on_header and the header string.

mikebeam commented 10 years ago

Sorry, the database has 2 other documents in addition to the design document.

benoitc commented 10 years ago

I will have a closer look in the morning. Thanks for the info :)

mikebeam commented 10 years ago

Also tried removing [{key, "test_key"}] from couchbeam_view:fetch. Similar result:

{client,hackney_tcp_transport,"localhost",5984,netloc,
        [{async,once}],
        #Port<0.18930>,
        {default,{"localhost",5984,hackney_tcp_transport},
                 <0.307.0>,hackney_tcp_transport},
        #Ref<0.0.0.72132>,true,hackney_pool,infinity,false,5,false,nil,
        undefined,
        {hparser,response,4096,10,0,on_header,
                 <<"Transfer-Encoding: chunked\r\nServer: CouchDB/1.1.0 (Erlang OTP/R15B)\r\nEtag: \"65CVTWB4VJQ7P2QP7GXMGUAF\"\r\nDate: Wed, 22 Jan 2014 21:53:37 GMT\r\nContent-Type: text/plain;charset=utf-8\r\nCache-Control: must-revalidate\r\n\r\n81\r\n{\"total_rows\":1,\"offset\":0,\"rows\":[\r\n{\"id\":\"f2178146df04fbd0bc638b3189001012\",\"key\":\"f2178146df04fbd0bc638b3189001012\",\"value\":1}\r\n4\r\n\r\n]}\r\n1\r\n\n\r\n0\r\n\r\n">>,
                 {1,1},
                 undefined,[],undefined,undefined,undefined,undefined,
                 undefined,waiting},
        connected,on_header,nil,normal,false,once,false,
        #Fun<hackney_request.send.2>,waiting,nil,4096,<<>>,[],
        {1,1},
        nil,nil,nil,<<"GET">>,nil}
mikebeam commented 10 years ago

You're welcome, thanks for your help!

benoitc commented 9 years ago

fixed in 4da9d58ed46052944174bbd2763ee431a5b739dc .

behrad commented 9 years ago

nice to hear this @benoitc