fluentbit not picking up file

tracyliuzw commented 2 years ago

1.When I collect data, I use the Flentdbit + Fluentd architecture, and the data is finally written to BigQuery. Fluentbit uses the tail plug-in, and the directory files I collect will generate a large number of log files in the unit of hours every hour, each time when the file is written to BigQuery Wait for an anomaly, feel field dislocation

2.Is there a limit on the length of the event or record? The data I collected using the Tail plug-in was about 300 bytes long, but the collection failed

[2022/09/21 02:42:05] [ info] [fluent bit] version=1.9.8, commit=97a5e9dcf3, pid=2052 [2022/09/21 02:42:05] [debug] [engine] coroutine stack size: 98302 bytes (96.0K) [2022/09/21 02:42:05] [ info] [storage] version=1.2.0, type=memory+filesystem, sync=normal, checksum=disabled, max_chunks_up=128 [2022/09/21 02:42:05] [ info] [storage] backlog input plugin: storage_backlog.1 [2022/09/21 02:42:05] [ info] [cmetrics] version=0.3.6 [2022/09/21 02:42:05] [debug] [tail:tail.0] created event channels: read=440 write=584 [2022/09/21 02:42:05] [debug] [input:tail:tail.0] flb_tail_fs_stat_init() initializing stat tail input [2022/09/21 02:42:05] [debug] [input:tail:tail.0] inode=1125899906846935 with offset=1203 appended as D:\log\log\battle_report.2022091300.log [2022/09/21 02:42:05] [debug] [input:tail:tail.0] 1 new files found on path 'D:\log\log\battle_report.2022091300.log' [2022/09/21 02:42:05] [debug] [storage_backlog:storage_backlog.1] created event channels: read=624 write=628 [2022/09/21 02:42:05] [ info] [input:storage_backlog:storage_backlog.1] queue memory limit: 15.3M [2022/09/21 02:42:05] [debug] [emitter:re_emitted] created event channels: read=632 write=636 [2022/09/21 02:42:05] [debug] [stdout:stdout.0] created event channels: read=644 write=648 [2022/09/21 02:42:05] [ info] [sp] stream processor started [2022/09/21 02:42:05] [ info] [output:stdout:stdout.0] worker #0 started [2022/09/21 02:42:05] [debug] [input:tail:tail.0] inode=1125899906846935 file=D:\log\log\battle_report.2022091300.log promote to TAIL_EVENT [2022/09/21 02:42:05] [debug] [input:tail:tail.0] [static files] processed 0b, done [2022/09/21 02:42:15] [debug] [input:tail:tail.0] 0 new files found on path 'D:\log\log\battle_report.2022091300.log' [2022/09/21 02:42:25] [debug] [input:tail:tail.0] 0 new files found on path 'D:\log\log\battle_report.2022091300.log' [2022/09/21 02:42:35] [debug] [input:tail:tail.0] 0 new files found on path 'D:\log\log\battle_report.2022091300.log' [2022/09/21 02:42:45] [debug] [input:tail:tail.0] 0 new files found on path 'D:\log\log\battle_report.2022091300.log'

patrick-stephens commented 2 years ago

What is the actual problem here for 1.? I don't really understand the issue from what is written so please can you provide some reproducer of the issue or more details.

For 2, tail functions as per tail -f so it will only pick up new entries in a file since it started tailing - you need to configure read_from_head on if you want to read existing data already present in a file when the file is opened. If you have configured a db setting then this will persist what data it has already sent: you do not want duplicate log entries in most cases. Have you done this? https://docs.fluentbit.io/manual/pipeline/inputs/tail Please provide your full configuration (as per the issue template) so we can see what options you are using. I'm guessing this is on Windows but also check file permissions - the user Fluent Bit runs as needs to be able to read the file otherwise it cannot see it.

tracyliuzw commented 2 years ago

My business scenario: About 50 files are generated per hour in the tail directory. Data is occasionally lost in the first minute of each hour.
Can you give me some advice on how to configure the tail directory when there are many files in it
My configuration is as follows:

[SERVICE] flush 5 daemon Off log_level warn Log_File /var/log/td-agent-bit/td-agent-bit.log parsers_file parsers.conf Decode_Field_as escaped_utf8 log storage.path /var/log/td-agent-bit/storage/ storage.sync normal storage.checksum Off storage.max_chunks_up 103 storage.backlog.mem_limit 16M HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_PORT 2020 Health_Check On HC_Errors_Count 5 HC_Retry_Failure_Count 5 HC_Period 5 [INPUT] Name tail Path /data/fjserver/log/gamelog/..log Tag tp Key tp Path_Key filename
Buffer_Chunk_Size 24m Buffer_Max_Size 32m Refresh_Interval 10 Rotate_Wait 5 Ignore_Older 2h Read_from_Head false storage.type filesystem storage.pause_on_chunks_overlimit on Skip_Empty_Lines On Skip_Long_Lines On DB /var/log/td-agent-bit/db/tp.db DB.sync normal DB.journal_mode WAL Mem_Buf_Limit 48m Exit_On_Eof false Inotify_Watcher true [INPUT] Name tail Path /data/fjserver/log/serverlog/.log Db /var/log/td-agent-bit/db/tpserver.db Tag tp.serverlog Key tp Mem_Buf_Limit 48m Buffer_Chunk_Size 24m Buffer_Max_Size 32m Refresh_Interval 10 Rotate_Wait 5 Ignore_Older 2h Read_from_Head false storage.type filesystem storage.pause_on_chunks_overlimit on Skip_Empty_Lines On Skip_Long_Lines On Exit_On_Eof false Inotify_Watcher true [FILTER] Name rewrite_tag Match tp Rule $filename \/(\S+)\/([a-zA-Z]+.\d{6})(\d{4}).log $TAG.$2 false Emitter_Name re_emitted Emitter_Mem_Buf_Limit 10M Emitter_Storage.type filesystem [FILTER] Name record_modifier Match . Record hostname ${HOSTNAME} [OUTPUT] Name Forward Match .* Upstream upstream.conf Require_ack_response true Send_options False Compress gzip Workers 2 storage.total_limit_size 48M

tracyliuzw commented 2 years ago

fluentd configuration file : <filter tp.vip.*> @type parser key_name tp reserve_data true remove_key_name_field true replace_invalid_sequence false emit_invalid_record_to_error true

@type csv ##common## types eventtype:string,eventtime:string,iggid:integer,f4:string,f5:string,f6:string,f7:string,f8:string null_empty_string true estimate_current_event true ##csv## keys eventtype,eventtime,iggid,f4,f5,f6,f7,f8 delimiter "\t" parser_type normal

@type file path /var/log/td-agent/buffer/tp/vip timekey_use_utc true chunk_limit_size 256MB #chunk_limit_records 3000 total_limit_size 512MB chunk_full_threshold 0.5 queued_chunks_limit_size 20 flush_at_shutdown true flush_mode interval flush_interval 5s flush_thread_interval 1 flush_thread_count 1 flush_thread_burst_interval 1 delayed_commit_timeout 60 overflow_action block retry_type exponential_backoff retry_timeout 24h retry_forever true retry_max_times 20 retry_wait 2

@type bigquery_insert
auth_method json_key
json_key /etc/td-agent/style-saga-aa24aba4a06d.json
project style-saga
dataset gamedata
table vip_${yyyymm}
fetch_schema_table vip
fetch_schema true
auto_create_table true
ignore_unknown_values true
schema_cache_expire 600
allow_retry_insert_errors true
request_timeout_sec 120
request_open_timeout_sec 120
skip_invalid_rows true

tracyliuzw commented 2 years ago

When data is finally written to BigQuery, an error log is reported: 2022-09-20 22:00:24 -0500 [trace]: #5 enqueueing all chunks in buffer instance=2220 2022-09-20 22:00:24 -0500 [trace]: #13 enqueueing all chunks in buffer instance=4100 2022-09-20 22:00:24 -0500 [trace]: #13 enqueueing all chunks in buffer instance=5480 2022-09-20 22:00:24 -0500 [trace]: #3 enqueueing all chunks in buffer instance=5420 2022-09-20 22:00:24 -0500 [trace]: #12 enqueueing all chunks in buffer instance=5180 2022-09-20 22:00:24 -0500 [debug]: #12 insert rows project_id="style-saga" dataset="gamedata" table="dressup_202209" count=1 2022-09-20 22:00:24 -0500 [warn]: #12 insert errors project_id="style-saga" dataset="gamedata" table="dressup_202209" insert_errors="[#<Google::Apis::BigqueryV2::InsertAllTableDataResponse::InsertError:0x00007fdf13c905f0 @errors=[#<Google::Apis::BigqueryV2::ErrorProto:0x00007fdf13e8fc98 @debug_info=\"\", @location=\"eventtime\", @message=\"Invalid datetime string \\"8921\\"\", @reason=\"invalid\">], @index=0>]" 2022-09-20 22:00:24 -0500 [debug]: #12 taking back chunk for errors. chunk="5e92725249ac4b3c4d09a3c00c42891c" 2022-09-20 22:00:24 -0500 [trace]: #12 taking back a chunk instance=2720 chunk_id="5e92725249ac4b3c4d09a3c00c42891c" 2022-09-20 22:00:24 -0500 [trace]: #12 chunk taken back instance=2720 chunk_id="5e92725249ac4b3c4d09a3c00c42891c" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables={:yyyymm=>"202209"}, seq=0> 2022-09-20 22:00:24 -0500 [error]: #12 Hit limit for retries. dropping all chunks in the buffer queue. retry_times=0 records=1 error_class=Fluent::BigQuery::UnRetryableError error="failed to insert into bigquery(insert errors), and cannot retry" 2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-bigquery-2.3.0/lib/fluent/plugin/bigquery/writer.rb:99:in insert_rows' 2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-bigquery-2.3.0/lib/fluent/plugin/out_bigquery_insert.rb:102:ininsert' 2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-bigquery-2.3.0/lib/fluent/plugin/out_bigquery_insert.rb:98:in write' 2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.2/lib/fluent/plugin/output.rb:1180:intry_flush' 2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.2/lib/fluent/plugin/output.rb:1501:in flush_thread_run' 2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.2/lib/fluent/plugin/output.rb:501:inblock (2 levels) in start' 2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create' 2022-09-20 22:00:24 -0500 [trace]: #12 clearing queue instance=2720 2022-09-20 22:00:24 -0500 [debug]: #12 buffer queue cleared 2022-09-20 22:00:24 -0500 [trace]: #5 enqueueing all chunks in buffer instance=3020

tracyliuzw commented 2 years ago

help me ,Give me some advice!!!!!!!!!!!!!!!!!!!!!!!!!!!

patrick-stephens commented 2 years ago

It looks to me like the issue is with Fluentd sending to BigQuery so you probably want to drill down on that and raise in the Fluentd repository for that plugin where there will be expertise on that.

Is there some issue with the Fluent Bit side of things specifically? There is an output plugin already to send to BigQuery from Fluent Bit directly so does that work? https://docs.fluentbit.io/manual/pipeline/outputs/bigquery

The tail inputs you have seem to be ok but I can't really comment as you know the specific log files you have.

I did note one seems to have a strange path, is that right or did you mean a wildcard?

[INPUT]
Name tail
Path /data/fjserver/log/gamelog/..log
Tag tp

Also only the server logs have a DB set so the other will just read from the end (not the beginning) when Fluent Bit starts: only new data added after Fluent Bit is watching the file will be picked up. This is what I mean by tail -f, it functions as that does.

The server logs have a DB so will record which offset they got up to last and start from there:

[INPUT]
Name tail
Path /data/fjserver/log/serverlog/*.log
Db /var/log/td-agent-bit/db/tpserver.db
Tag tp.serverlog

Unrelated but you do not have to provide all configuration options, only the ones required or different from the defaults. The Slack channel is likely better for discussion around best practice and the various options vs your requrements.

tracyliuzw commented 2 years ago

thank you your help !!! sorry,Path /data/fjserver/log/gamelog/..log, I copy wrong, the correct is: the Path /data/fjserver/log/gamelog/..log In the current two-tier architecture, Fluentbit is mainly put together with the production environment server. For fear of occupying too much production server resources during collection, the data is only processed slightly by forward to FluentD, and then processed by FluentD to Bigquery.
My original data is CSV. Is it possible that the fields are missing during the parsing process? I found that the fields are missing and misplaced in the error log of writing BigQuery, but the BigQuery fails to be written.

tracyliuzw commented 2 years ago

The asterisk cannot be entered??

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 5 days with no activity.

fluent / fluent-bit

fluentbit not picking up file #6077