awslabs / amazon-kinesis-agent

Continuously monitors a set of log files and sends new data to the Amazon Kinesis Stream and Amazon Kinesis Firehose in near-real-time.
Other
358 stars 222 forks source link

Agent failed to start from previous checkpoint after log file rotation #102

Open nemalipuri opened 7 years ago

nemalipuri commented 7 years ago

agent was unable to start from previous checkpoint after log file rotation. Error: “Expected the service to be RUNNING, but the service has FAILED”.

agent.json

{
    "cloudwatch.emitMetrics": true,
    "firehose.endpoint": "firehose.us-east-1.amazonaws.com",
    "awsAccessKeyId": "",
    "awsSecretAccessKey": "",
    "maxSendingThreads": 15,
    "flows": [
        {
            "filePattern": "/opt/app/ums/current/log/dev.log",
            "deliveryStream": "firehose-deliverystream-m1-kinesispoc",
            "dataProcessingOptions": [
                {
                    "optionName": "LOGTOJSON",
                    "logFormat": "COMMONAPACHELOG",
                    "matchPattern": "^\\[(.*)\\] \\[([\\d.]+)\\] \\[(.*?)\\]\\s?(.*)",
                    "customFieldNames": ["uuid", "ip", "uri", "log_message"]
                }
            ]
        }
    ]
}

Error:

Error:
2017-07-05 09:14:12.346-0600 localhost (main) com.amazon.kinesis.streaming.agent.Agent [INFO] Reading configuration from file: /etc/aws-kinesis/agent.json
2017-07-05 09:14:12.436-0600 localhost (main) com.amazon.kinesis.streaming.agent.processing.processors.AgentDataConverterChain [DEBUG] Using data converter chain: AgentDataConverterChain [com.amazon.kinesis.streaming.agent.processing.processors.LogToJSONDataConverter@6c2cba0a ]
2017-07-05 09:14:12.455-0600 localhost (main) com.amazon.kinesis.streaming.agent.Agent [INFO] null: Agent will use up to 12 threads for sending data.
2017-07-05 09:14:12.474-0600 localhost (main) com.amazon.kinesis.streaming.agent.tailing.checkpoints.SQLiteFileCheckpointStore [DEBUG] Connecting to database /var/run/aws-kinesis-agent/checkpoints...
2017-07-05 09:14:12.710-0600 localhost (main) com.amazon.kinesis.streaming.agent.tailing.checkpoints.SQLiteFileCheckpointStore [INFO] Deleted 0 old checkpoints.
2017-07-05 09:14:12.743-0600 localhost (Agent STARTING) com.amazon.kinesis.streaming.agent.Agent [INFO] Agent: Starting up...
2017-07-05 09:14:12.758-0600 localhost (Agent STARTING) com.amazon.kinesis.streaming.agent.Agent [INFO] Agent: Starting tailer for file fh:firehose-deliverystream-m1-kinesispoc:/opt/app/ums/current/log/dev.log
2017-07-05 09:14:12.789-0600 localhost (FileTailer[fh:firehose-deliverystream-m1-kinesispoc:/opt/app/ums/current/log/dev.log]) com.amazon.kinesis.streaming.agent.tailing.FileTailer [DEBUG] FileTailer[fh:firehose-deliverystream-m1-kinesispoc:/opt/app/ums/current/log/dev.log]: Starting up...
2017-07-05 09:14:12.791-0600 localhost (FileTailer[fh:firehose-deliverystream-m1-kinesispoc:/opt/app/ums/current/log/dev.log]) com.amazon.kinesis.streaming.agent.tailing.FileTailer [DEBUG] FileTailer[fh:firehose-deliverystream-m1-kinesispoc:/opt/app/ums/current/log/dev.log]: Found checkpoint FileCheckpoint(file=TrackedFile(id=(dev=ca51,ino=791364), path=/opt/app/ums/current/log/dev.log, lastModifiedTime=1499123740000, size=14427187), offset=14428345, fileId=(dev=ca51,ino=791364), flowId=fh:firehose-deliverystream-m1-kinesispoc:/opt/app/ums/current/log/dev.log).
2017-07-05 09:14:12.825-0600 localhost (Agent STARTING) com.amazon.kinesis.streaming.agent.Agent [ERROR] Unhandled exception during startup.
java.lang.IllegalStateException: Expected the service to be RUNNING, but the service has FAILED
        at com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:285)
        at com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:229)
        at com.google.common.util.concurrent.AbstractExecutionThreadService.awaitRunning(AbstractExecutionThreadService.java:197)
        at com.amazon.kinesis.streaming.agent.Agent.startUp(Unknown Source)
        at com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54)
        at com.google.common.util.concurrent.Callables$3.run(Callables.java:95)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
        at com.google.common.util.concurrent.MoreExecutors$2.execute(MoreExecutors.java:849)
        at com.google.common.util.concurrent.AbstractIdleService$2.doStart(AbstractIdleService.java:50)
        at com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:174)
        at com.google.common.util.concurrent.AbstractIdleService.startAsync(AbstractIdleService.java:135)
        at com.amazon.kinesis.streaming.agent.Agent.main(Unknown Source)
Caused by: java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:158)
        at com.amazon.kinesis.streaming.agent.tailing.TrackedFileRotationAnalyzer.findCurrentOpenFileAfterTruncate(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.updateCurrentFile(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.initialize(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.FileTailer.initialize(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.FileTailer.startUp(Unknown Source)
        at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:55)
        at com.google.common.util.concurrent.Callables$3.run(Callables.java:95)
        at java.lang.Thread.run(Thread.java:745)
2017-07-05 09:14:12.829-0600 localhost (FileTailer[fh:firehose-deliverystream-m1-kinesispoc:/opt/app/ums/current/log/dev.log]) com.amazon.kinesis.streaming.agent.Agent [ERROR] FATAL: Thread FileTailer[fh:firehose-deliverystream-m1-kinesispoc:/opt/app/ums/current/log/dev.log] threw an unrecoverable error. Aborting application
java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:158)
        at com.amazon.kinesis.streaming.agent.tailing.TrackedFileRotationAnalyzer.findCurrentOpenFileAfterTruncate(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.updateCurrentFile(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.initialize(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.FileTailer.initialize(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.FileTailer.startUp(Unknown Source)
        at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:55)
        at com.google.common.util.concurrent.Callables$3.run(Callables.java:95)
        at java.lang.Thread.run(Thread.java:745)
tomiloza commented 6 years ago

I have the same problem after the Log File rotation:

2018-08-29 07:35:29.256+0000 ip-10-4-1-47 (Agent STARTING) com.amazon.kinesis.streaming.agent.Agent [ERROR] Unhandled exception during startup. java.lang.IllegalStateException: Expected the service to be RUNNING, but the service has FAILED at com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:285) at com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:229) at com.google.common.util.concurrent.AbstractExecutionThreadService.awaitRunning(AbstractExecutionThreadService.java:197) at com.amazon.kinesis.streaming.agent.Agent.startUp(Unknown Source) at com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54) at com.google.common.util.concurrent.Callables$3.run(Callables.java:95) at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) at com.google.common.util.concurrent.MoreExecutors$2.execute(MoreExecutors.java:849) at com.google.common.util.concurrent.AbstractIdleService$2.doStart(AbstractIdleService.java:50) at com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:174) at com.google.common.util.concurrent.AbstractIdleService.startAsync(AbstractIdleService.java:135) at com.amazon.kinesis.streaming.agent.Agent.main(Unknown Source) Caused by: java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:158) at com.amazon.kinesis.streaming.agent.tailing.TrackedFileRotationAnalyzer.findCurrentOpenFileAfterTruncate(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.updateCurrentFile(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.initialize(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.initialize(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.startUp(Unknown Source) at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:55) at com.google.common.util.concurrent.Callables$3.run(Callables.java:95) at java.lang.Thread.run(Thread.java:748) 2018-08-29 07:35:29.264+0000 ip-10-4-1-47 (FileTailer[fh:spotify:/var/log/sockd/sockd.log]) com.amazon.kinesis.streaming.agent.Agent [ERROR] FATAL: Thread FileTailer[fh:spotify:/var/log/sockd/sockd.log] threw an unrecoverable error. Aborting application java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:158) at com.amazon.kinesis.streaming.agent.tailing.TrackedFileRotationAnalyzer.findCurrentOpenFileAfterTruncate(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.updateCurrentFile(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.initialize(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.initialize(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.startUp(Unknown Source) at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:55) at com.google.common.util.concurrent.Callables$3.run(Callables.java:95) at java.lang.Thread.run(Thread.java:748) (END)

leandro-rouberte commented 5 years ago

Same thing happened to me. I moved the checkpoints file in /var/run/aws-kinesis-agent/checkpoints and let kinesis agent recreate it. This solved the problem.

TheHinsh commented 3 years ago

I have a set of steps that make this issue appear every time. Using a tool to continually write to a temp file (/tmp/log_events.json) that is being successfully monitored by aws-kinesis-agent, run the following commands.

truncate -s 0 /tmp/log_events.json
truncate -s 0 /tmp/log_events.json
systemctl restart aws-kinesis-agent
tail -f /var/log/aws-kinesis-agent/aws-kinesis-agent.log

Commenting out two lines will fix this exception. The assert checks are not really needed, as the code runs the same check later:

index 1a9f3b1..e03df74 100644
--- a/src/com/amazon/kinesis/streaming/agent/tailing/TrackedFileRotationAnalyzer.java
+++ b/src/com/amazon/kinesis/streaming/agent/tailing/TrackedFileRotationAnalyzer.java
@@ -160,8 +160,8 @@ public class TrackedFileRotationAnalyzer {
      */
     public int findCurrentOpenFileAfterTruncate() {
         Preconditions.checkNotNull(currentOpenFile);
-        Preconditions.checkState(incoming.size() > 1); // always true after rotation by Truncate
-        Preconditions.checkState(hasCounterpart(incoming.get(0))); // always true for rotation by Truncate
+        // Preconditions.checkState(incoming.size() > 1); // always true after rotation by Truncate
+        // Preconditions.checkState(hasCounterpart(incoming.get(0))); // always true for rotation by Truncate
         // In case of rotation by Truncate, the current open file
         // MUST have a counterpart: either the latest file, or a previously
         // rotated file. If not, this is an anomaly (could happen if