Closed amitdhawan closed 3 years ago
Here is the log of top command and shows whole residence memory is consumed by ruby and also the CPU usage in 100%
@repeatedly any help
This is the same as this issue https://github.com/fluent/fluentd/issues/2379
I think no limit for target file. But fluentd focuses on streaming data with lower latency so fluentd is not optimized for like 4GB archive data. Embulk or other batch/bulk loader is fit for such cases. Consuming resource seems normal. Decompress 760MB -> 4GB, parsing 4GB in input, format 4GB in output(I'm not familiar with kinesis output so this point is just assumption) needs CPU power.
@repeatedly So do u mean to say I can process large files in parallel from S3 using embulk?
And is fluentd meant for streaming real time logs rather than processing large files in a go?
@repeatedly I did a small poc on embulk and got the impression that is a bulk importer of data from S3 by using a trigger by command line and doesn't fit my requirement of importing data on the trigger of file upload on s3 which fluentd does so.
Let me know in case u think otherwise and I can use embulk in my scenario.
@repeatedly I can now see memory going full throttle even if I consume 200mb gz file from s3. The Ruby aommand is taking full cpu power and memory. Do I need to do some optimizations in terms of Ruby or fluentd to make all this work.
Get Outlook for Androidhttps://aka.ms/ghei36
Amit Dhawan Lead Software Engineer Monotype Solutions India Pvt. Ltd. Fourth Floor - Tower B, Prius Universal Plot A-3, 4 & 5, Sector -125 Noida 201301 Phone +91 9953936137 Our Brand Family Monotype.com Olapic.com
From: Masahiro Nakagawa notifications@github.com Sent: Saturday, April 13, 2019 2:21:21 AM To: fluent/fluent-plugin-s3 Cc: Dhawan, Amit; Author Subject: Re: [fluent/fluent-plugin-s3] Plugin unable to read file size of 4GB. Is there any upper limit? (#271)
I think no limit for target file. But fluentd focuses on streaming data with lower latency so fluentd is not optimized for 4GB archive data. Embulk or other batch/bulk loader is fit for such cases. Consuming resource seems normal. Decompress 760MB -> 4GB, parsing 4GB in input, format 4GB in output needs CPU power.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_fluent_fluent-2Dplugin-2Ds3_issues_271-23issuecomment-2D482718702&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=IYP1DntLKkzvnKg4LWusdSl7wH9JPw74WIsxoqR5cwQ&m=XkDrgFATAaFRQJ8SN8n7j_48HWVqSyJZ1foLAGepQ3g&s=X76qu6yzZTilxcJlRMDY9CTDer8UAGgchFNgTCAYOwQ&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOMhBtpVNEroXuCcPGfGXAh3xAJyAePIks5vgPHJgaJpZM4cr0Ka&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=IYP1DntLKkzvnKg4LWusdSl7wH9JPw74WIsxoqR5cwQ&m=XkDrgFATAaFRQJ8SN8n7j_48HWVqSyJZ1foLAGepQ3g&s=lWhs6v6e6-osFxduZMTll-ptsBpu6uukTjWPcrIPzdo&e=.
@okkez Do you have any insight for this? With 200mb gz, fluentd will take 200mb + uncompressed 200mb (maybe 400+mb) + more(new events from file) memory. So I assume fluentd temporarily consumes about 1G for this case. CPU usage seems to depend file content and kinesis output implemetation.
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
This issue was automatically closed because of stale in 30 days
Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
fluentd or td-agent version. td-agent 1.3.3
Environment information:
Operating system: cat /etc/os-release NAME="Ubuntu" VERSION="16.04.5 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.5 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial Kernel version: uname -r 4.4.0-1077-aws Your configuration
Your problem explanation. If you have an error logs, write it together.
Below is my config file:-
m processing log files uploaded on S3 and pushing them to Kinesis stream. For checking out Fluentd capabilities right now im running td-agent in AWS EC2 t2.micro instance. Now, for log files containg 100 records or so Im getting the output logs in Kinesis. But when i upload a log file of aroung 175MB gz format the fliuentd seems to behave unexpectedly and keeps on showing me the trace log as below
Not able to ready file [Gz format] size of around 760MB when unzip it is around 4GB.
Is there any upper limit of file size in this?