fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.91k stars 1.34k forks source link

comparison of NilClass with 1473379200 failed #1219

Closed dimaqq closed 8 years ago

dimaqq commented 8 years ago

My stack:

My config:

II get a bunch of errors like this:

fluentd_1  | 2016-09-09 07:37:34 +0000 [warn]: emit transaction failed: error_class=ArgumentError error="comparison of NilClass with 1473379200 failed" tag="apache.error"
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/out_file.rb:62:in `sort_by'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/out_file.rb:62:in `generate_chunk'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/buffer.rb:447:in `block in write_once'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/buffer.rb:447:in `write_once'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/buffer.rb:197:in `block in write'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/buffer.rb:196:in `each'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/buffer.rb:196:in `write'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/output.rb:665:in `block in handle_stream_with_custom_format'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/output.rb:610:in `write_guard'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/output.rb:664:in `handle_stream_with_custom_format'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/output.rb:602:in `execute_chunking'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/output.rb:544:in `emit_buffered'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/out_copy.rb:38:in `block in process'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/out_copy.rb:37:in `each'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/out_copy.rb:37:in `process'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/multi_output.rb:89:in `emit_sync'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/event_router.rb:153:in `emit_events'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/event_router.rb:90:in `emit_stream'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:310:in `receive_lines'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:420:in `wrap_receive_lines'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:579:in `on_notify'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:444:in `on_notify'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:425:in `attach'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:210:in `setup_watcher'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:231:in `block in start_watchers'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:218:in `each'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:218:in `start_watchers'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:204:in `refresh_watchers'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/plugin/in_tail.rb:156:in `start'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/root_agent.rb:138:in `block in start'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/root_agent.rb:127:in `block (2 levels) in lifecycle'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/root_agent.rb:126:in `each'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/root_agent.rb:126:in `block in lifecycle'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/root_agent.rb:113:in `each'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/root_agent.rb:113:in `lifecycle'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/root_agent.rb:137:in `start'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/engine.rb:211:in `start'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/engine.rb:175:in `run'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/supervisor.rb:677:in `run_engine'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/supervisor.rb:436:in `block in run_worker'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/supervisor.rb:606:in `main_process'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/supervisor.rb:431:in `run_worker'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/lib/fluent/command/fluentd.rb:271:in `<top (required)>'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/lib/ruby/gems/2.3.0/gems/fluentd-0.14.6/bin/fluentd:5:in `<top (required)>'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/bin/fluentd:23:in `load'
fluentd_1  |   2016-09-09 07:37:34 +0000 [warn]: /usr/bin/fluentd:23:in `<main>'

Test log record:

fluentd_1  | 2016-09-09 07:51:17 +0000 apache.access: {"test":1473407477293,"request_id":"V9Jp9awSAAYAAAAJftUAAACA","client":"10.0.2.89","host":"bmg","path":"/","query":"","method":"GET","status":"200","userAgent":"curl/7.50.1","duration":2,"request_size":72,"response_size":186}

I'm not sure if this is fluentd internal error or the output plugin.

repeatedly commented 8 years ago

Could you paste your configuation?

dimaqq commented 8 years ago

Sure thing!

<source>
    # FIXME doesn't work yet
    @type tail
    path /data/nginx/error.log
    pos_file /data/nginx-error-log.pos
    tag nginx.error
    format nginx # Do you have a custom format? You can write your own regex.
</source>

<source>
    # FIXME doesn't work yet
    @type tail
    path /data/nginx/access.log
    pos_file /data/nginx-access-log.pos
    tag nginx.access
    format json
</source>

<source>
    # FIXME doesn't work yet
    @type tail
    path /data/apache/error.log
    pos_file /data/apache-error-log.pos
    tag apache.error

    # per http://docs.fluentd.org/articles/common-log-formats
    # +tid +errorcode
    format /^\[[^ ]* (?<time>[^\]]*)\] \[((?<module>[^:\]]*):)?(?<level>[^\]]*)\] \[pid (?<pid>[^:\]]*)(:tid (?<tid>[^:\]]*))\] (\[client (?<client>[^\]]*)\] )?(?<errorcode>[A-Za-z0-9]*)?: (?<message>.*)$/
</source>

<source>
    @type tail
    path /data/apache/access.log
    pos_file /data/apache-access-log.pos
    tag apache.access

    # https://github.com/fluent/fluentd/blob/d8a40cdfc97c3f0197feff8ddfbb1a9df51bfd1b/lib/fluent/plugin/parser_apache2.rb
    # +requestid +ms
    # format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] (?<requestid>[^ ]+) (?<ms>[^ ]+) "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
    format json
    # FIXME millisoconds are tricky, but are available now; alternatively we could leave that parsing to cloud
</source>

<source>
    @type  forward
    @id    docker-host
    port  24224
</source>

<source>
    @type http
    @id   turbo
    port 8080
    bind 0.0.0.0
    body_size_limit 32m
    keepalive_timeout 10s
</source>

<filter **>
    @type stdout
</filter>

<match **>
    @type copy
    <store>
        @type file
        @id   turbolog
        path         /data/data.*.log
        symlink_path /data/data.log
        append       true
        time_slice_format %Y%m%d
        time_slice_wait   10m
        time_format       %Y%m%dT%H%M%S%z
    </store>
    <store>
        @type loggly_buffered
        loggly_url https://logs-01.loggly.com/bulk/xxx-xxx-xxx-xxx-xxx/tag/fluentd
        time_precision_digits 3
        buffer_type file
        buffer_path /data/loggly-buffer
        flush_interval 5s
    </store>
</match>

Data is picked up by apache.access.

Apache is configured to dump logs in JSON format, that is one JSON document per line.

dimaqq commented 8 years ago

Sample access.log content:

 $▶sudo tail /var/lib/docker/volumes/turbo_apache_log/_data/access.log                                                                                   Fri 09 Sep 2016 10:27:14 CEST
{ "time":1473338778.810, "test":1473338778810, "request_id":"V9FdmqwSAAUAAAAJjOwAAABD", "client":"10.0.2.89", "host":"bmg", "path":"/", "query":"", "method":"GET", "status":"200", "userAgent":"ApacheBench/2.3", "duration":1, "request_size":76, "response_size":205 }
{ "time":1473338778.810, "test":1473338778810, "request_id":"V9FdmqwSAAUAAAAdcNcAAADB", "client":"10.0.2.89", "host":"bmg", "path":"/", "query":"", "method":"GET", "status":"200", "userAgent":"ApacheBench/2.3", "duration":1, "request_size":76, "response_size":205 }

Note that time is floating point, using LogFormat "{ \"time\":%{sec}t.%{msec_frac}t, ... in apache config.

repeatedly commented 8 years ago

fluentd_1 | 2016-09-09 07:37:34 +0000 [warn]: emit transaction failed: error_class=ArgumentError error="comparison of NilClass with 1473379200 failed" tag="apache.error"

From your log, error happened with apache.error How to determine apache.access is the problem? > "Data is picked up by apache.access"

dimaqq commented 8 years ago

Something is weird...

I've deleted all containers and volumes and now this specific error is gone.

I suspect that old version of fluentd (0.12.29) left data in the volume and new version (0.14.6) could not understand it.

This data could be loggly_buffered buffer at /data/loggly-buffer, or position files at /data/*-*.pos or fluentd's own data at /data/data.*.log or metadata at /data/data.*.log.meta.

So it looks like an upgrade problem.

Does fluentd support upgrade from 0.12 to 0.14 without wiping data on disk?

repeatedly commented 8 years ago

@tagomoris Do we have a plan to add a feature which converts existing older buffers into new buffers?

tagomoris commented 8 years ago

v0.14 file buffer originally can handle existing buffer chunks generated by older Fluentd (without wiping). But this issue shows that the hack implementing sym_link on file output plugins has a bug not to consider about such resumed buffer chunks. It's bug, and should be fixed.