awslabs / amazon-kinesis-agent

Continuously monitors a set of log files and sends new data to the Amazon Kinesis Stream and Amazon Kinesis Firehose in near-real-time.
Other
358 stars 222 forks source link

Excessive Memory Usage (Out of Heap Space) Reading Log files from START_OF_FILE #142

Open nahap opened 6 years ago

nahap commented 6 years ago

When reading a large file from START_OF_FILE, i get out of heap space messages.

I have set my max Heap size to 4096m, it worked for smaller files but

with filesize 300MB with 39516 lines of json, i get an out of heap space message. I would have expected the tailer to read the file in chunks, process the chunk, free memory and continue with the next chunk, but this does not seem to be the case.

This is an installation on a Debian system

Am I missing something here?

Thanks in Advance Andy

jatin-kumar commented 5 years ago

I have been facing the same issue as well. (eventually worked around it by changing JAVA_START_HEAP=${JAVA_START_HEAP:-2048m} JAVA_MAX_HEAP=${JAVA_MAX_HEAP:-2048m}

While talking to aws support, it sounded like we can use "publishQueueCapacity":"10" per flow to control how many buffers are made per flow. (assuming this is what it actually means). Anyway what puzzles me is that if the agent knows the max_heap_size and it 'potentially' also knows the number of streams, the size of record per stream, why cant it chunk the processing(as also mentioned by @nahap ) instead of just running out of memory? Hope i am not wrong with my understanding of how it works.

thanks jatin

av3ri commented 5 years ago

@jatin-kumar I am facing the same issue with my kinesis agents constantly crashing due to java heap issues. Can you please tell me how/where you adjusted the java heap memory settings for the aws kinesis agent?

I am running the agent on Centos 7 if that helps.