failed to flush the buffer - Mongo_replset does not reconnect to replica set

ceecko commented 7 years ago

We're using td-agent 2.3.1 and we are getting the following error messages from time to time. After the error message the output plugin out_mongo_replset does not reconnect to the replica set and stays in this stuck state until it's restarted. No new messages are saved in MongoDB.

td-agent[18558]: 2016-09-24 22:02:21 +0000 [warn]: Failed to connect to Replica Set. Try to retry: retry number = 1
td-agent[18558]: 2016-09-24 22:02:21 +0000 [warn]: retry succeeded. plugin_id="object:153dc74"
td-agent[18558]: 2016-09-24 22:09:52 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-09-24 22:09:33 +0000 error_class="NameError" error="uninitialized constant IRB::Abort" plugin_id="object:153dc74"
td-agent[18558]: 2016-09-24 22:09:52 +0000 [warn]: suppressed same stacktrace
td-agent[18558]: 2016-09-24 22:10:20 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-09-24 22:09:35 +0000 error_class="NameError" error="uninitialized constant IRB::Abort" plugin_id="object:153dc74"
td-agent[18558]: 2016-09-24 22:10:20 +0000 [warn]: suppressed same stacktrace
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-09-24 22:09:39 +0000 error_class="Mongo::OperationTimeout" error="Timed out waiting on socket read." plugin_id="object:153dc74"
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/networking.rb:336:in `rescue in receive_message_on_socket'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/networking.rb:330:in `receive_message_on_socket'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/networking.rb:191:in `receive_header'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/networking.rb:182:in `receive'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/authentication.rb:446:in `auth_command'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/authentication.rb:409:in `issue_scram'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/authentication.rb:225:in `issue_authentication'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:321:in `block in check_auths'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/set.rb:263:in `each_key'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/set.rb:263:in `each'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:320:in `check_auths'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:285:in `block (2 levels) in checkout'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:267:in `synchronize'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:267:in `block in checkout'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:260:in `loop'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:260:in `checkout'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/mongo_replica_set_client.rb:413:in `get_socket_from_pool'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/mongo_replica_set_client.rb:371:in `block in checkout_reader'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/mongo_replica_set_client.rb:354:in `checkout'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/mongo_replica_set_client.rb:369:in `checkout_reader'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/cursor.rb:623:in `checkout_socket_from_connection'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/cursor.rb:550:in `block in send_initial_query'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:55:in `block in instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:20:in `instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:54:in `instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/cursor.rb:547:in `send_initial_query'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/cursor.rb:532:in `refresh'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/cursor.rb:139:in `next'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/db.rb:607:in `command'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection_writer.rb:353:in `block in batch_message_send'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:55:in `block in instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:20:in `instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:54:in `instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection_writer.rb:352:in `batch_message_send'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection_writer.rb:82:in `block in batch_write_incremental'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection_writer.rb:57:in `catch'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection_writer.rb:57:in `batch_write_incremental'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection.rb:1184:in `batch_write'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection.rb:411:in `insert'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mongo-0.7.12/lib/fluent/plugin/out_mongo.rb:154:in `operate'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mongo-0.7.12/lib/fluent/plugin/out_mongo_replset.rb:45:in `block in operate'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mongo-0.7.12/lib/fluent/plugin/out_mongo_replset.rb:61:in `rescue_connection_failure'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mongo-0.7.12/lib/fluent/plugin/out_mongo_replset.rb:44:in `operate'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mongo-0.7.12/lib/fluent/plugin/out_mongo.rb:133:in `write'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/buffer.rb:345:in `write_chunk'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/buffer.rb:324:in `pop'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/output.rb:329:in `try_flush'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/output.rb:140:in `run'
systemd[1]: Stopping td-agent...

repeatedly commented 7 years ago

Please paste your configuration here.

I checked mongo driver and it seems to delete closed socket: https://github.com/mongodb/mongo-ruby-driver/blob/31c59b900dc8223ab97e4019d8fafb8f197754f1/lib/mongo/connection/pool.rb#L294 Could you give me the result of sigdump?

http://docs.fluentd.org/articles/trouble-shooting#dump-fluentd-internal-information

I want to know which function call causes this problem.

ceecko commented 7 years ago

I suspect the issue may be connected to MongoDB changing primary since during the election there's no primary and logs cannot be flushed.

I have already restarted td-agent, will the sigdump help or should I wait until the issue appears again?

Config

<source>
  type forward
  port 24224
  bind 127.0.0.1

  @label @docker
</source>

<label @docker>
  <filter docker.*>
    type record_transformer
    remove_keys app_id,container_name
  </filter>
  <match docker.*>
    type mongo_replset
    tag_mapped
    remove_tag_prefix docker.
    flush_interval 10s
    nodes ip1:27017,ip2:27017
    name rs0
    user user1
    password xxx
    database logs
    capped
    capped_size 2097152
  </match>
  <match ssl.*>
    type mongo_replset
    tag_mapped
    flush_interval 10s
    nodes ip1:27017,ip2:27017
    name rs0
    user user1
    password xxx
    database logs
    capped
    capped_size 524288
  </match>
  <match **>
    type mongo_replset
    tag_mapped
    flush_interval 10s
    nodes ip1:27017,ip2:27017
    name rs0
    user user1
    password xxx
    database logs
    capped
    capped_size 2097152
  </match>
</label>

<label @accesslogs>
  <filter **>
    type record_transformer
    renew_record true
    <record>
      tag ${tag_parts[3]}
      httpStatusCode ${code}
      upstreamTime ${upstream_time}
    </record>
  </filter>
  <match **>
    type mongo_replset
    flush_interval 10s
    nodes ip1:27017,ip2:27017
    name rs0
    user user1
    password xxx
    database access_logs
    collection http_status_queue
  </match>
</label>

repeatedly commented 7 years ago

will the sigdump help or should I wait until the issue appears again?

Yeah. If the problem happens again, sigdump result helps problem investigation.

ceecko commented 7 years ago

I did a couple sigdump's, since I wasn't sure if and where it was saved...

sigdump-1183.txt sigdump-1186.txt

ceecko commented 7 years ago

Is there any other info I can provide? We've experienced this issue pretty often recently.

repeatedly commented 7 years ago

Hmm... hard to debug because it seems the problem happens inside mongo driver, not plugin. Could you update fluent-plugin-mongo to v0.8.0? It uses mongo driver v2.x.

ceecko commented 7 years ago

Sure, I can give it a try. Any hints how to update td-agent to use the v0.8.0?

repeatedly commented 7 years ago

See: http://docs.fluentd.org/v0.12/articles/plugin-management

ceecko commented 7 years ago

I have just deployed the new plugin to one of our nodes. So far it seems that fluentd starts and works ok. However I didn't find a way to specify multiple hosts for a replica set. There's only one host config var available. How can I specify multiple hosts as with the old nodes config var?

repeatedly commented 7 years ago

It is the regression of updating mongo driver to v2.x. I discussed this problem on https://github.com/fluent/fluent-plugin-mongo/issues/88. Maybe, @megamk will send a patch.

repeatedly commented 7 years ago

I can't get a response from megamk, so I wrote a patch for this. v0.8.1 has nodes parameter for replica set. Or you can use connection_string with Mongo URI.

megamk commented 7 years ago

I can't get a response from megamk, so I wrote a patch for this.

I'm sorry, my ruby knowledge is not enough to dive into how your system tests work so I was planning to do this in my spare time before making a PR. But I couldn't find time for that yet, sorry.

Great that you've adopted my suggestion though, thanks!

fluent / fluent-plugin-mongo

failed to flush the buffer - Mongo_replset does not reconnect to replica set #82