Closed tatsu-yam closed 3 years ago
Yes. This problem is similar to process wide conflict. This problem happens when S3 processing is slower than next chunk flush. To avoid this problem, show warning message like "Buffer configuration uses multiple flush threads. Recommend to use chunk_id or uuid_flush in object path to avoid object conflict". How about this?
I encountered this bug and consulted with tatsu-yama. I think it would be better to change the default to %{uuid_flush} or %{chunk_flush} and describe the risk of using %{index} in the s3_object_key_format in the documentation.
Changing default value affects existing users. So need 2 steps for it.
I certainly forgot about the impact on existing users ;)
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
This issue was automatically closed because of stale in 30 days
In fluent-plugin-s3, if the value of flush_thread_count is greater than 1 , the data on S3 will be missing. I think that it is because
Fluent::Plugin::S3Output#write
method is not thread safe.td-agent.conf
Test data (
/tmp/td-agent-failure-sample/tmp/1500000.log
) was transferred to s3 with td-agent. As a result, the number of records in the original data and the data on s3 do not match. Of course, there are no errors in td-agent.log.So, I made the following modifications to fluentd and fluent-plugin-s3 and transferred it.
The resulting td-agent.log.
Two threads uploaded to filename
20200512.0.dat.gz
. I think the process of determining%{index}
is not thread safe.I think using
uuid_flush
will probably work around this problem. However, since the default value ofs3_object_key_format
is"%{path}%{time_slice}_%{index}. %{file_extension}"
, so I think this will affect a lot of users.I think this issue is relevant. https://github.com/fluent/fluent-plugin-s3/issues/315