Open ceecko opened 8 years ago
Please paste your configuration here.
I checked mongo driver and it seems to delete closed socket: https://github.com/mongodb/mongo-ruby-driver/blob/31c59b900dc8223ab97e4019d8fafb8f197754f1/lib/mongo/connection/pool.rb#L294 Could you give me the result of sigdump?
http://docs.fluentd.org/articles/trouble-shooting#dump-fluentd-internal-information
I want to know which function call causes this problem.
I suspect the issue may be connected to MongoDB changing primary since during the election there's no primary and logs cannot be flushed.
I have already restarted td-agent, will the sigdump help or should I wait until the issue appears again?
Config
<source>
type forward
port 24224
bind 127.0.0.1
@label @docker
</source>
<label @docker>
<filter docker.*>
type record_transformer
remove_keys app_id,container_name
</filter>
<match docker.*>
type mongo_replset
tag_mapped
remove_tag_prefix docker.
flush_interval 10s
nodes ip1:27017,ip2:27017
name rs0
user user1
password xxx
database logs
capped
capped_size 2097152
</match>
<match ssl.*>
type mongo_replset
tag_mapped
flush_interval 10s
nodes ip1:27017,ip2:27017
name rs0
user user1
password xxx
database logs
capped
capped_size 524288
</match>
<match **>
type mongo_replset
tag_mapped
flush_interval 10s
nodes ip1:27017,ip2:27017
name rs0
user user1
password xxx
database logs
capped
capped_size 2097152
</match>
</label>
<label @accesslogs>
<filter **>
type record_transformer
renew_record true
<record>
tag ${tag_parts[3]}
httpStatusCode ${code}
upstreamTime ${upstream_time}
</record>
</filter>
<match **>
type mongo_replset
flush_interval 10s
nodes ip1:27017,ip2:27017
name rs0
user user1
password xxx
database access_logs
collection http_status_queue
</match>
</label>
will the sigdump help or should I wait until the issue appears again?
Yeah. If the problem happens again, sigdump result helps problem investigation.
I did a couple sigdump's, since I wasn't sure if and where it was saved...
Is there any other info I can provide? We've experienced this issue pretty often recently.
Hmm... hard to debug because it seems the problem happens inside mongo driver, not plugin. Could you update fluent-plugin-mongo to v0.8.0? It uses mongo driver v2.x.
Sure, I can give it a try. Any hints how to update td-agent to use the v0.8.0?
I have just deployed the new plugin to one of our nodes. So far it seems that fluentd starts and works ok.
However I didn't find a way to specify multiple hosts for a replica set. There's only one host
config var available. How can I specify multiple hosts as with the old nodes
config var?
It is the regression of updating mongo driver to v2.x. I discussed this problem on https://github.com/fluent/fluent-plugin-mongo/issues/88. Maybe, @megamk will send a patch.
I can't get a response from megamk, so I wrote a patch for this. v0.8.1 has nodes parameter for replica set. Or you can use connection_string with Mongo URI.
I can't get a response from megamk, so I wrote a patch for this.
I'm sorry, my ruby knowledge is not enough to dive into how your system tests work so I was planning to do this in my spare time before making a PR. But I couldn't find time for that yet, sorry.
Great that you've adopted my suggestion though, thanks!
We're using td-agent 2.3.1 and we are getting the following error messages from time to time. After the error message the output plugin
out_mongo_replset
does not reconnect to the replica set and stays in this stuck state until it's restarted. No new messages are saved in MongoDB.