Add an new input type to backfill gzipped logs

lminaudier commented 8 years ago

Hi,

Following this discussion on filebeat forum I would like to ask if it is possible to implement a solution to easily backfill old gzipped logs with filebeat.

The proposed solution mentioned in the topic is to add a new dedicated input_type.

It is also mentioned in the topic that when filebeat reaches the end of input on stdin it does not give you the hand back and waits for new lines which makes things hard to script to perform backfilling.

What are your thoughts on this ?

Thanks for your hard work.

C-Duv commented 6 years ago

filebeat's -once flag (« Run filebeat only once until all harvesters reach EOF ») makes it stops at the end of stdin.

I just used it successfully with the following filebeat configuration file to import multiple old gzipped log files to my logstash instance:

filebeat.inputs:
- type: stdin
  enabled: true
  ignore_older: 0
  tags: ["web_access_log"]
  fields:
    server: "foo-server.example.com"
    webserver: "apache"
  fields_under_root: true
  json.overwrite_keys: true

output.logstash:
  hosts: ["logstash.example.com:8088"]

And with the following command:

zcat "file.gz" | filebeat -once -e -c "filebeat-std.config.yml"

jundl77 commented 6 years ago

+1 for gzip support

austriae commented 6 years ago

+1 for gzip support

jorgeeb4 commented 6 years ago

+1 for gzip support

jasperdj commented 6 years ago

After three years of waiting, we continue to +1 for gzip support

sm00thindian commented 6 years ago

+1

mielie1000 commented 6 years ago

+1 for gzip support

ravipz commented 6 years ago

+1 for gzip support

burandobata commented 6 years ago

+1 for gzip support

kirillrst commented 6 years ago

+1

GaganJotSingh commented 6 years ago

Is there any update/progress for this issue?

arnoldasb commented 5 years ago

+1 for gzip support

widhalmt commented 5 years ago

+1 from me for gzip support

pleminh commented 5 years ago

+1 for gzip support

ivan-mezentsev commented 5 years ago

+1 for gzip support

martinfmeneses commented 5 years ago

+1 for gzip support

ghost commented 5 years ago

+1 for gzip support

fenchu commented 5 years ago

Gentlemen Splunk can read .gz as is, and has for years. My problem is that my .gz files are ~20GB and beyond my control. Currently I run in scheduler: 7z.exe e -oc *.gz* | logstash.bat -f sensioproxy.grok or: 7z.exe e -oc *.gz* | filebeat.exe -once -e -c "sensioproxy-filebeat.yml" to a logstash on elk-server

This works somewhat, but the zcat|7z.exe e -oc is way faster than the logstash|filebeat part so it takes all 16GB of memory on the clients before data has been transferred.

Another issue is handling of duplicates, since sincedb is not working. I currently have an ugly pythonscript to handle duplicates.

The logstash-codec-gzip_lines plugin does not output a proper format of my files like zcat/7z.exe e -oc

GiovanniBattista commented 5 years ago

+1 for gzip support

caribbeantiger commented 5 years ago

+1 for gzip support

Umaigenomu commented 5 years ago

+1 for gzip support

anlijun commented 5 years ago

+1 for gzip support

jesusslim commented 5 years ago

+1 for gzip support

kop7 commented 5 years ago

+1 for gzip support

muthunagarajan commented 5 years ago

+1 for gzip support

sadokmtir commented 5 years ago

+1 for gzip support

debojitkakoti commented 5 years ago

Is there any update on this feature? +1 for gzip support.

holgerbrandl commented 5 years ago

Log compression is the standard. It would be really great if filebeat could read gz-logs.

kolikons commented 5 years ago

+1 gzip

esparky commented 5 years ago

+1

codingogre commented 5 years ago

+1

stevebanik-ndsc commented 5 years ago

This issue has been open for 3 years. Is the best approach to just use logstash for .gz files?

fenchu commented 5 years ago

This issue has been open for 3 years. Is the best approach to just use logstash for .gz files? I recommend unpacking it before giving it to logstash. Another option is to use the _bulk endpoint to insert directly into elasticsearch, one line index and one line data at a far faster rate.

ioigoume commented 5 years ago

+1 gzip.

Also a comment from my side is that consistency breaks if you create a new input type for zipped logs. I do understand that plain log files are different from gz files but it is much more "readable" to have one entry for all log files of the same application.

mahmouddar commented 5 years ago

any update :(

IHeilig commented 5 years ago

+1 for gzip

amrithamenon16-zz commented 5 years ago

Is there any update on this?

thekofimensah commented 4 years ago

I have many log files that are in .gz format that I'd love to decompress automatically rather than running custom decompressing scripts first. I have 100's of file locations that I need to create scripts for that I wish I didn't have to.