fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.83k stars 1.34k forks source link

Fluentd in_http behind nginx, error 104 #3096

Open eshutov opened 4 years ago

eshutov commented 4 years ago

Describe the bug I was going to use configuration with fluentd and in_http plugin behind nginx (where nginx is just a proxy providing security). But faced with this issue in nginx logs while testing:

2020/08/04 13:27:05 [error] 436#436: *214 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.122.1, server: localhost, request: "POST /test HTTP/1.1", upstream: "http://172.18.0.2:9880/test", host: "192.168.122.73"

It appears in around 15% cases of transmission. (15% of nginx log entries are the same as mentioned above). But despite this, all events pass through fluentd. I mean: they successfully leave it through output plugin. (tested on tens of events) Here below is the simplest as possible configuration where the issue appears.

To Reproduce Touch fluentd (behind nginx) with curl several times:

curl -X POST -d '{"foo":"bar"}' -H 'Content-Type: application/json' http://$nginx_ip/test

Expected behavior Error entries should not be in nginx logs

Your Environment

Your Configuration Fluentd official container: fluentd.conf

<source>
  @type http
</source>

<match **>
  @type null
</match>

Nginx official container: just edit "location /" in default.conf and do "kill -HUP"

    location / {
        proxy_pass http://fluentd:9880;
    }

Your Error Log

192.168.122.1 - - [04/Aug/2020:13:27:04 +0000] "POST /test HTTP/1.1" 200 0 "-" "curl/7.64.0" "-"
192.168.122.1 - - [04/Aug/2020:13:27:05 +0000] "POST /test HTTP/1.1" 200 0 "-" "curl/7.64.0" "-"
2020/08/04 13:27:05 [error] 436#436: *214 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.122.1, server: localhost, request: "POST /test HTTP/1.1", upstream: "http://172.18.0.2:9880/test", host: "192.168.122.73"
192.168.122.1 - - [04/Aug/2020:13:27:06 +0000] "POST /test HTTP/1.1" 200 0 "-" "curl/7.64.0" "-"
192.168.122.1 - - [04/Aug/2020:13:27:07 +0000] "POST /test HTTP/1.1" 200 0 "-" "curl/7.64.0" "-"
192.168.122.1 - - [04/Aug/2020:13:27:08 +0000] "POST /test HTTP/1.1" 200 0 "-" "curl/7.64.0" "-"
192.168.122.1 - - [04/Aug/2020:13:27:08 +0000] "POST /test HTTP/1.1" 200 0 "-" "curl/7.64.0" "-"
192.168.122.1 - - [04/Aug/2020:13:27:09 +0000] "POST /test HTTP/1.1" 200 0 "-" "curl/7.64.0" "-"
2020/08/04 13:27:10 [error] 436#436: *226 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.122.1, server: localhost, request: "POST /test HTTP/1.1", upstream: "http://172.18.0.2:9880/test", host: "192.168.122.73"
192.168.122.1 - - [04/Aug/2020:13:27:10 +0000] "POST /test HTTP/1.1" 200 0 "-" "curl/7.64.0" "-"
192.168.122.1 - - [04/Aug/2020:13:27:11 +0000] "POST /test HTTP/1.1" 200 0 "-" "curl/7.64.0" "-"

Additional context It can be somehow related with #2963

kenhys commented 3 years ago

FYI:

It seems that it will be reproducible. even though newer versions.

Ubuntu 1804( https://app.vagrantup.com/generic/boxes/ubuntu1804 ) nginx 1.19.3 (compiled from source by default because 1.19.3 is not packaged yet. then ./configure && make && sudo make install) ruby 2.7.2 (installed via rvm install 2.7.2) fluentd 1.11.4 (installed from gem install fluentd)

kenhys commented 3 years ago

FYI

This error log may be come from https://github.com/nginx/nginx/blob/release-1.19.3/src/os/unix/ngx_readv_chain.c#L235

        err = ngx_socket_errno;

        if (err == NGX_EAGAIN || err == NGX_EINTR) {
            ngx_log_debug0(NGX_LOG_DEBUG_EVENT, c->log, err,
                           "readv() not ready");
            n = NGX_AGAIN;

        } else {
            n = ngx_connection_error(c, err, "readv() failed");
            break;
        }
simplyzee commented 1 year ago

Did anyone ever get down to the bottom of this? I'm seeing similar errors on EKS running the NGINX ingress controller exposing via TCP

There was a mention here to use v1.11.1 https://github.com/fluent/fluentd/issues/2963#issuecomment-664900958 but I'm running the latest version released.

From Fluentd itself:

2023-04-21 16:37:07 +0000 [warn]: #1 unexpected error error="Could not parse data entirely (0 != 338)"
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/in_http.rb:407:in `<<'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/in_http.rb:407:in `on_read'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/in_http.rb:296:in `block in on_server_connect'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin_helper/server.rb:640:in `on_read_without_connection'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/io.rb:123:in `on_readable'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/io.rb:186:in `on_readable'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run_once'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

NGINX Ingress Controller:

  2023/04/21 16:37:07 [error] 8007#8007: *1522969 recv() failed (104: Connection reset by peer) while proxying and reading from upstream, client: 10.88.62.159, server: 0.0.0.0:8080, upstream: "10.88.37.240:8080", bytes from/to client:338/0, bytes from/to upstream:0/338
[10.88.62.159] [21/Apr/2023:16:37:07 +0000] TCP 200 0 338 0.001
simplyzee commented 1 year ago

Did anyone ever get down to the bottom of this? I'm seeing similar errors on EKS running the NGINX ingress controller exposing via TCP

There was a mention here to use v1.11.1 #2963 (comment) but I'm running the latest version released.

From Fluentd itself:

2023-04-21 16:37:07 +0000 [warn]: #1 unexpected error error="Could not parse data entirely (0 != 338)"
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/in_http.rb:407:in `<<'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/in_http.rb:407:in `on_read'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/in_http.rb:296:in `block in on_server_connect'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin_helper/server.rb:640:in `on_read_without_connection'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/io.rb:123:in `on_readable'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/io.rb:186:in `on_readable'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run_once'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2023-04-21 16:37:07 +0000 [warn]: #1 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

NGINX Ingress Controller:

2023/04/21 16:37:07 [error] 8007#8007: *1522969 recv() failed (104: Connection reset by peer) while proxying and reading from upstream, client: 10.88.62.159, server: 0.0.0.0:8080, upstream: "10.88.37.240:8080", bytes from/to client:338/0, bytes from/to upstream:0/338
[10.88.62.159] [21/Apr/2023:16:37:07 +0000] TCP 200 0 338 0.001

I've resolved this issue. If it helps anyone it's because of SSL with TCP exposure with Nginx Ingress Controller. it was failing via HTTPS calls so it's likely that I've not configured TCP exposure to support HTTPS 👍🏽

HTTP calls work just fine 😄