Open LeonD9 opened 2 years ago
Please try the following and report back your results...
get rid of 'type_name' entry
either: -get rid of 'include_tag_key true' -add 'tag_key' entry
@ryn9 removed type_name
and include_tag_key
didn't help i still continue to get the following error:
2022-03-28 14:28:44 +0000 [warn]: #0 [opensearch] failed to flush the buffer. retry_times=5 next_retry_time=2022-03-28 14:29:00 +0000 chunk="5db481db7caf87bc9550c195959bd79b" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"****\", :port=>443, :scheme=>\"https\"}): undefined method `call' for nil:NilClass"
It seems its an issue related to fluent-plugin-opensearch latest version, reverted to 1.0.2 fixed my issue.
@cosmo0920 - perhaps a bug in 1.0.3 ?
@cosmo0920 - I hit the same bug when using 1.0.3
Sure, I'll take a look.
Could you paste around the error? Ruby is script language. We should know where is error occurred and how it is caused.
@cosmo0920
Here is a 1.0.3 error
2022-03-30 20:33:16 +0000 [debug]: #0 [out_opensearch] Created new chunk chunk_id="5db75748cfce7588a2607967c0bca145" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
2022-03-30 20:33:20 +0000 [debug]: #0 [out_opensearch] taking back chunk for errors. chunk="5db75748cfce7588a2607967c0bca145"
2022-03-30 20:33:20 +0000 [warn]: #0 [out_opensearch] failed to flush the buffer. retry_times=0 next_retry_time=2022-03-30 20:33:50 +0000 chunk="5db75748cfce7588a2607967c0bca145" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"REDACTED\", :port=>443, :scheme=>\"https\", :user=>\"REDACTED\", :password=>\"obfuscated\"}): undefined method `call' for nil:NilClass"
2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1119:in `rescue in send_bulk'
2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1081:in `send_bulk'
2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:892:in `block in write'
2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `each'
2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `write'
2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:1179:in `try_flush'
2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:1495:in `flush_thread_run'
2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
2022-03-30 20:33:20 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-03-30 20:33:50 +0000 [debug]: #0 [out_opensearch] taking back chunk for errors. chunk="5db75748cfce7588a2607967c0bca145"
2022-03-30 20:33:50 +0000 [warn]: #0 [out_opensearch] failed to flush the buffer. retry_times=1 next_retry_time=2022-03-30 20:34:24 +0000 chunk="5db75748cfce7588a2607967c0bca145" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"REDACTED\", :port=>443, :scheme=>\"https\", :user=>\"REDACTED\", :password=>\"obfuscated\"}): undefined method `call' for nil:NilClass"
2022-03-30 20:33:50 +0000 [warn]: #0 suppressed same stacktrace
As compared to same config and same test messages being sent through 1.0.2
2022-03-30 20:39:57 +0000 [debug]: #0 [out_opensearch] Created new chunk chunk_id="5db758c75713c0982038aed5cf73d4cd" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
And then it flushed successfully without any logged messages
@cosmo0920 @ryn9 did you guys also see this warning message in 1.0.2 gem:
2022-04-04 07:15:53 +0000 [warn]: #0 emit transaction failed: error_class=NoMethodError error="undefined method `+' for nil:NilClass" location="/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1341:in `submit_flush_once'" tag="kubernetes.var.log.containers.kube-metrics-adapter-6f75fbbbc4-s5smw_kube-system_kube-metrics-adapter-ee0fb2300dec03ae655ae7ad6e094e98dcebb26b61ac9e0213647a9cb015f763.log" 2022-04-04 07:15:53 +0000 [warn]: #0 suppressed same stacktrace
any idea what can cause this error?
@LeonD9 Could you paste more around error? Ruby's backtrace is also important to digging error reason.
@cosmo0920 - I ran this again - tracing - and it looks like this occurs in the bulk error condition. If I send a payload with no errors - it appears to work okay. When I send a payload with an error - it errors like above.
Here is a slightly modified trace:
2022-04-05 15:49:21 +0000 [trace]: #0 [out_opensearch] writing events into buffer instance=3000 metadata_size=1
2022-04-05 15:49:21 +0000 [debug]: #0 [out_opensearch] Created new chunk chunk_id="5dbea303f4f77964612f171a1b631b04" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
2022-04-05 15:49:21 +0000 [trace]: #0 [out_opensearch] chunk /fluentd/root_dir/worker0/out_opensearch/buffer/buffer.b5dbea303f4f77964612f171a1b631b04.log size_added: 206 new_size: 206
2022-04-05 15:49:22 +0000 [trace]: #2 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:22 +0000 [trace]: #1 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] enqueueing chunk instance=3000 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
2022-04-05 15:49:22 +0000 [trace]: #3 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] dequeueing a chunk instance=3000
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] chunk dequeued instance=3000 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] trying flush for a chunk chunk="5dbea303f4f77964612f171a1b631b04"
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] adding write count instance=3440
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] executing sync write chunk="5dbea303f4f77964612f171a1b631b04"
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] bulk request: {"index":{"_index":"demo-field1-field2-2022.13","_type":"_doc"}}
{"foo.bar":"test","message":"test","@timestamp":"2022-03-31T13:52:51.315315315+00:00"}
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] bulk response: {"took"=>7, "errors"=>true, "items"=>[{"index"=>{"_index"=>"demo-field1-field2-2022.13", "_type"=>"_doc", "_id"=>"0Xxq-n8Br-TZ0qY9lARI", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Could not dynamically add mapping for field [foo.bar]. Existing mapping for [foo] must be of type object but found [text]."}}}]}
2022-04-05 15:49:22 +0000 [debug]: #0 [out_opensearch] taking back chunk for errors. chunk="5dbea303f4f77964612f171a1b631b04"
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] taking back a chunk instance=3000 chunk_id="5dbea303f4f77964612f171a1b631b04"
2022-04-05 15:49:22 +0000 [trace]: #0 [out_opensearch] chunk taken back instance=3000 chunk_id="5dbea303f4f77964612f171a1b631b04" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="test", variables=nil, seq=0>
2022-04-05 15:49:22 +0000 [warn]: #0 [out_opensearch] failed to flush the buffer. retry_times=0 next_retry_time=2022-04-05 15:49:27 +0000 chunk="5dbea303f4f77964612f171a1b631b04" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"REDACTED\", :port=>443, :scheme=>\"https\", :user=>\"REDACTED\", :password=>\"obfuscated\"}): undefined method `call' for nil:NilClass"
2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1119:in `rescue in send_bulk'
2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1081:in `send_bulk'
2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:892:in `block in write'
2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `each'
2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `write'
2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:1179:in `try_flush'
2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:1495:in `flush_thread_run'
2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
2022-04-05 15:49:22 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-04-05 15:49:23 +0000 [trace]: #2 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:23 +0000 [trace]: #1 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:23 +0000 [trace]: #0 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:23 +0000 [trace]: #3 [out_opensearch] enqueueing all chunks in buffer instance=3000
2022-04-05 15:49:24 +0000 [trace]: #2 [out_opensearch] enqueueing all chunks in buffer instance=3000
@cosmo0920 the same as this issue: https://github.com/fluent/fluentd/issues/1973 there is no more logs except this in debug/trace mode:
2022-04-16 19:25:26 +0000 [trace]: #0 [main-logs] writing events into buffer instance=458700 metadata_size=1
2022-04-16 19:25:26 +0000 [debug]: #0 [main-logs] Created new chunk chunk_id="5dcd50bbcfeaf9b7d6fc14f983017cbb" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="kubernetes.var.log.containers.test.log", variables=nil, seq=0>
2022-04-16 19:25:26 +0000 [trace]: #0 [main-logs] chunk /var/log/fluentd/worker0/main-logs/buffer/buffer.b5dcd50bbcfeaf9b7d6fc14f983017cbb.log size_added: 3499 new_size: 3499
2022-04-16 19:25:26 +0000 [warn]: #0 emit transaction failed: error_class=NoMethodError error="undefined method `+' for nil:NilClass" location="/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1341:in `submit_flush_once'" tag="kubernetes.var.log.containers.test.log"
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1341:in `submit_flush_once'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:853:in `emit_buffered'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:830:in `emit_events'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/event_router.rb:115:in `emit_stream'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/out_relabel.rb:29:in `process'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:839:in `emit_sync'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/event_router.rb:115:in `emit_stream'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:338:in `on_message'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:226:in `block in handle_connection'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:263:in `block (3 levels) in read_messages'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:262:in `feed_each'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:262:in `block (2 levels) in read_messages'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/in_forward.rb:271:in `block in read_messages'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/server.rb:613:in `on_read_without_connection'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/io.rb:123:in `on_readable'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/io.rb:186:in `on_readable'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run_once'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
2022-04-16 19:25:26 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-04-16 19:25:26 +0000 [error]: #0 unexpected error on reading data host="10.1.7.177" port=42756 error_class=NoMethodError error="undefined method `get' for nil:NilClass"
@cosmo0920 anything more we can provide in order to get this resolved?
Ordinary records will not cause this issue. Could you provide a sample record that makes this error? Providing reproducible step is more better but sample invalid record should be needed for resolving this issue.
I've encountered this issue too. Interestingly, I never observed the problem before adding filters to retag records with JSON in the message
& re-parse. fluentd.conf
now looks like:
<label @FLUENT_LOG>
<match fluent.**>
@type null
@id ignore_fluent_logs
</match>
</label>
<source>
@type tail
@id in_tail_container_logs
path "/var/log/containers/*.log"
pos_file "/var/log/fluentd-containers.log.pos"
tag "kubernetes.*"
exclude_path []
read_from_head true
<parse>
@type "cri"
time_format "%Y-%m-%dT%H:%M:%S.%N%z"
</parse>
</source>
<filter kubernetes.**>
@type kubernetes_metadata
@id filter_kube_metadata
kubernetes_url "https://172.30.0.1:443/api"
verify_ssl true
ca_file ""
skip_labels false
skip_container_metadata false
skip_master_url false
skip_namespace_metadata false
watch true
</filter>
<match kubernetes.**>
@type rewrite_tag_filter
<rule>
key message
pattern /^\{.*\}$/
tag application.json
</rule>
<rule>
key kubernetes.container_name
pattern /^known-container$/
tag application.freetext
</rule>
<rule>
key message
pattern /.*$/
tag unidentified
</rule>
</match>
<filter application.json>
@type parser
key_name "message"
reserve_data true
remove_key_name_field true
<parse>
@type json
</parse>
</filter>
<match application.json>
@type opensearch
@id out_os_app_json
include_tag_key true
host "opensearch-cluster-master"
port 9200
path ""
scheme https
ssl_verify false
ssl_version TLSv1_2
user "admin"
password "xxxxxxx"
index_name "fluentd_application"
request_timeout 30s # defaults to 5s
suppress_type_name true
</match>
<match application.**>
@type opensearch
@id out_os_app
include_tag_key true
host "opensearch-cluster-master"
port 9200
path ""
scheme https
ssl_verify false
ssl_version TLSv1_2
user "admin"
password "xxxxxxx"
index_name "fluentd_application"
request_timeout 30s # defaults to 5s
suppress_type_name true
</match>
# # Catchall...
<match **>
@type stdout
@id unidentified_log
</match>
Error output from fluentd:
2022-05-19 09:14:41 +0000 [warn]: #0 [out_os_app_json] failed to flush the buffer. retry_times=0 next_retry_time=2022-05-19 09:14:43 +0000 chunk="5df59ca32e546139145bdc1c15cf2986" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"opensearch-cluster-master\", :port=>9200, :scheme=>\"https\", :user=>\"admin\", :password=>\"obfuscated\", :path=>\"\"}): undefined method `call' for nil:NilClass"
2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1119:in `rescue in send_bulk'
2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:1081:in `send_bulk'
2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:892:in `block in write'
2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `each'
2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluent-plugin-opensearch-1.0.3/lib/fluent/plugin/out_opensearch.rb:891:in `write'
2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1179:in `try_flush'
2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1500:in `flush_thread_run'
2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
2022-05-19 09:14:41 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
I can't tell what record(s) are causing the problem, but when I switch the out_os_app_json
output to stdout
, a sample record looks like this.
{
"stream"=>"stdout",
"logtag"=>"F",
"message"=>"{\"remote\":\"127.0.0.1:52400\",\"client\":\"conn216652\",\"doc\":{\"driver\":{\"name\":\"mongo-go-driver\",\"version\":\"v1.7.0\"},\"os\":{\"type\":\"linux\",\"architecture\":\"amd64\"},\"platform\":\"go1.17.5\",\"mongos\":{\"host\":\"shared-mongodb-mongos-5d65d4667c-n2fwv:27017\",\"client\":\"127.0.0.1:52400\",\"version\":\"4.4.10-11\"}}}}",
"time"=>"2022-05-19T09:00:57.120674031+00:00",
"docker"=>{"container_id"=>"1e31fb4a258736fc6eaeb35d5b5ed45c12c21963f64a97e917e87bd79734384b"},
"kubernetes"=>{"container_name"=>"fluentd",
"namespace_name"=>"xxxxxxxx",
"pod_name"=>"fluentd-2n9kd",
"container_image"=>"docker.io/fluent/fluentd-kubernetes-daemonset:v1.14.6-debian-opensearch-1.0",
"container_image_id"=>"docker.io/fluent/fluentd-kubernetes-daemonset@sha256:054d75bef7ca91b6434d2bc86d387b7de919c408dac7e0ee88322cc5926d8798",
"pod_id"=>"3519ab44-1b54-4346-97f6-fdb0254d41cd",
"pod_ip"=>"10.129.1.7",
"host"=>"compute-0",
"labels"=>{"controller-revision-hash"=>"68458f67f6",
"k8s-app"=>"fluentd-logging",
"pod-template-generation"=>"1",
"version"=>"v1"},
"master_url"=>"https://172.30.0.1:443/api",
"namespace_id"=>"ec79e7e2-526a-4213-9f43-fcb8ef8cafc7",
"namespace_labels"=>{"kubernetes_io/metadata_name"=>"xxxxxxxx"}
}
}
NOTE that, whilst I can't tell if this is the offending record, I've selected it as I've noticed the json in the message
hash contains a spurious }
. This invalid json log appears to have been generated by fluentd container, presumably triggered by some condition in the original log, but I'm yet to find the root cause.
Whether this invalid json is related to this issue or not, I don't know.
Running with trace logs I've found illegal_state_exception
Can't get text on a START_OBJECT at 1:1320
:
2022-05-19 14:49:29 +0000 [trace]: #0 [out_os_app_json] bulk response: {"took"=>925, "errors"=>true, "items"=>[
...
...
{"index"=>{"_index"=>"dc4h-goal-1-application", "_id"=>"yKfL3IAB4pmfZooFjGdO", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [attr.command] of type [text] in document with id 'yKfL3IAB4pmfZooFjGdO'. Preview of field's value: '{ordered=true, $replData=1, $configServerState={opTime={t=1, ts={$timestamp={t=1652971704, i=3}}}}, $db=config, $clusterTime={clusterTime={$timestamp={t=1652971704, i=4}}, signature={keyId=7099418590524211201, hash={$binary={base64=ww8hldh66Ok8b+I4w+Gt7OUgUnA=, subType=0}}}}, maxTimeMS=30000, update=mongos, bypassDocumentValidation=false, writeConcern={wtimeout=60000, w=majority}, updates=[{q={_id=shared-mongodb-mongos-6bd459598f-9bwvp:27017}, u={$set={mongoVersion=4.4.10-11, waiting=true, ping={$date=2022-05-19T14:48:25.183Z}, _id=shared-mongodb-mongos-6bd459598f-9bwvp:27017, up=9474, advisoryHostFQDNs=[]}}, upsert=true, multi=false}]}'", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:1320"}}}}
...
...
2022-05-19 14:49:29 +0000 [debug]: #0 [out_os_app_json] taking back chunk for errors. chunk="5df5e77819c10486f117a8ce77af0598"
2022-05-19 14:49:29 +0000 [trace]: #0 [out_os_app_json] taking back a chunk instance=50740 chunk_id="5df5e77819c10486f117a8ce77af0598"
2022-05-19 14:49:29 +0000 [trace]: #0 [out_os_app_json] chunk taken back instance=50740 chunk_id="5df5e77819c10486f117a8ce77af0598" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="application.json", variables=nil, seq=0>
2022-05-19 14:49:30 +0000 [warn]: #0 [out_os_app_json] failed to flush the buffer. retry_times=0 next_retry_time=2022-05-19 14:49:31 +0000 chunk="5df5e77819c10486f117a8ce77af0598" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"opensearch-cluster-master.dc4h-goal-1.svc\", :port=>9200, :scheme=>\"https\", :user=>\"admin\", :password=>\"obfuscated\", :path=>\"\"}): undefined method `call' for nil:NilClass"
The related record for the bulk insert appears to be:
{"stream":"stdout","logtag":"F","time":"2022-05-19T14:48:25.332505099+00:00","docker":{"container_id":"2699a109e1c84b45c3636b03460ba99ec70ea88b5066ca17c427ccd0c97e29b8"},"kubernetes":{"container_name":"mongod","namespace_name":"redacted","pod_name":"shared-mongodb-cfg-0","container_image":"docker.io/percona/percona-server-mongodb:4.4.10-11","container_image_id":"docker.io/percona/percona-server-mongodb@sha256:ea73a506fa02604660e3ef7d452d142a89587bb5daca15d3cc1b539a9b1000c5","pod_id":"a05dc0e0-d49b-41e6-a2cf-47e1d5366c1e","pod_ip":"10.129.1.26","host":"compute-0","labels":{"controller-revision-hash":"shared-mongodb-cfg-67b7fc579","app_kubernetes_io/component":"cfg","app_kubernetes_io/instance":"shared-mongodb","app_kubernetes_io/managed-by":"percona-server-mongodb-operator","app_kubernetes_io/name":"percona-server-mongodb","app_kubernetes_io/part-of":"percona-server-mongodb","app_kubernetes_io/replset":"cfg","statefulset_kubernetes_io/pod-name":"shared-mongodb-cfg-0"},"master_url":"https://172.30.0.1:443/api","namespace_id":"d83f45cc-dc13-4846-88f2-e9991388c94d","namespace_labels":{"kubernetes_io/metadata_name":"redacted"}},"t":{"$date":"2022-05-19T14:48:25.332+00:00"},"s":"I","c":"COMMAND","id":51803,"ctx":"conn11951","msg":"Slow query","attr":{"type":"command","ns":"config.$cmd","command":{"update":"mongos","bypassDocumentValidation":false,"ordered":true,"updates":[{"q":{"_id":"shared-mongodb-mongos-6bd459598f-9bwvp:27017"},"u":{"$set":{"_id":"shared-mongodb-mongos-6bd459598f-9bwvp:27017","ping":{"$date":"2022-05-19T14:48:25.183Z"},"up":9474,"waiting":true,"mongoVersion":"4.4.10-11","advisoryHostFQDNs":[]}},"multi":false,"upsert":true}],"writeConcern":{"w":"majority","wtimeout":60000},"maxTimeMS":30000,"$replData":1,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1652971704,"i":4}},"signature":{"hash":{"$binary":{"base64":"ww8hldh66Ok8b+I4w+Gt7OUgUnA=","subType":"0"}},"keyId":7099418590524211201}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1652971704,"i":3}},"t":1}},"$db":"config"},"numYields":0,"reslen":619,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"w":1}},"Database":{"acquireCount":{"w":1}},"Collection":{"acquireCount":{"w":1}},"Mutex":{"acquireCount":{"r":1}}},"flowControl":{"acquireCount":1,"timeAcquiringMicros":3},"writeConcern":{"w":"majority","wtimeout":60000,"provenance":"clientSupplied"},"storage":{},"protocol":"op_msg","durationMillis":103},"tag":"application.json"}
The JSON looks valid as far as I can tell, so my previous comment may be a red herring. However, the addition of UPDATE: not true. replace_invalid_sequence true
to the json <filter>
does seem to stop the errors occurring on the openshift out plugin....replace_invalid_sequence true
does not work around the problem
hth
Thanks! Possible candidates of reproducible records should be very helpful for us. I'll dig into the root cause of this issue.
@cosmo0920 I can reproduce it, I think it is happening if the bulk request gets error 400 from the OpenSearch, for example, because of datatype mismatch https://github.com/fluent/fluent-plugin-opensearch/issues/53. I am getting this error after I upgraded to the latest version. Before the error was very clear when log_os_400_reason true
was set, now if fails to flush the chunk and gives undefined method ``call' for nil:NilClass"
I have downgraded to 1.0.2
with fluent-gem install 'fluent-plugin-opensearch' -v 1.0.2
and works as expected.
Same issue with 1.0.4, it seems to also not flush the items from the buffer, which leads to a buffer overrun and then total system halt (i.e. no logs at all)
Downgrade to 1.0.2 solved it. But frankly that's not really a solution.
I'm not sure why block
becomes nil
.
Then, I created a patch to use if clause instead of block operation.
How about this patch?
Steps to replicate
Opensearch out buffer config:
I receive the following error when trying to send logs to opensearch:
what can be the reason for this error?
Expected Behavior or What you need to ask
Access is opened to opensearch and logs arrive from time to time but i keep receiving this error most of the time, what can be the reason for this error?
Using Fluentd and OpenSearch plugin versions
fluentd --version
ortd-agent --version
: v1.14.3-debian-elasticsearch7-1.0fluent-gem list
,td-agent-gem list
or your Gemfile.lock: