awslabs / amazon-kinesis-agent

Continuously monitors a set of log files and sends new data to the Amazon Kinesis Stream and Amazon Kinesis Firehose in near-real-time.
Other
354 stars 221 forks source link

unable to parse log matchPattern #180

Open kittawei1123 opened 5 years ago

kittawei1123 commented 5 years ago

 "filePattern": "/var/log/wlc.log",
  "kinesisStream": "ec2_log",
  "maxBufferAgeMillis": "1000",
  "dataProcessingOptions": [
    {
       "initialPostion": "START_OF_FILE",
       "optionName": "LOGTOJSON",
       "logFormat": "SYSLOG",
       "matchPattern": "^(\\d+) ([\\w{3} (\\w{3}) (\\d{2}) ([\\d.]+[\\d{2}]) (\\d{4})]) (.*?)",
       "customFieldNames": ["sq_num", "time_stamp","message" ]

my json file is ok, kinesis can start up successfully, but write my log writing to the file , it will report error: com.amazon.kinesis.streaming.agent.Agent [ERROR] FATAL: Thread FileTailer[kinesis:ec2_log:/var/log/wlc.log] threw an unrecoverable error. Aborting application java.lang.IllegalStateException here is my log example: 1 Mon Aug 12 01:19:18 2019 Rogue AP: a4:d9:31:55:dc:ac detected on Base Radio MAC: f4:db:e6:96:ff:a0 Interface no: 1(802.11a) Channel: 64 RSSI: -71 SNR: 21 Classification: unclassified, State: Alert, RuleClassified : N, Severity Score: 0, RuleName: N.A. ,Classified AP MAC: 00:00:00:00:00:00 ,Classified RSSI: 0 the problem is how to write matchPattern correctly, sorry, it's not regex, I am confused, and don't know how to match a bunch of strings with my keyword "message". thanks.

kittawei1123 commented 5 years ago

I wish output can be : sq_num:1
timestamp: Mon Aug 12 01:19:18 2019 message: Rogue AP: a4:d9:31:55:dc:ac detected on Base Radio MAC: f4:db:e6:96:ff:a0 Interface no: 1(802.11a) Channel: 64 RSSI: -71 SNR: 21 Classification: unclassified, State: Alert, RuleClassified : N, Severity Score: 0, RuleName: N.A. ,Classified AP MAC: 00:00:00:00:00:00 ,Classified RSSI: 0

kittawei1123 commented 5 years ago

I use this for regex test, it's ok, but failed to parse it when apply it in kinesis agent..

Tailer Progress: Tailer has parsed 0 records (624 bytes), transformed 0 records, skipped 0 records, and has successfully sent 0 records to destination.

^(\\d+)\\s{2,}(\\w{3} \\w{3} \\d+ \\d{2}:\\d{2}:\\d{2} \\d{4})\\s{2,}(.*)

willUrgently commented 5 years ago

If you turn on debugging, log4j, you should see if checkpoints are being reused. Maybe not what you intended.

Issue 118: "Initialization logs can be found in /tmp/aws-kinesis-agent.20190712081053.initlog”. You should have a similar file to review ….

Will Martin

On Aug 12, 2019, at 10:13 PM, kittawei1123 notifications@github.com wrote:

I use this for regex test, it's ok, but failed to apply it in kinesis agent.. ^(\d+)\s{2,}(\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4})\s{2,}(.*)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/awslabs/amazon-kinesis-agent/issues/180?email_source=notifications&email_token=ALCZMGBLXAB7JJZOEDG3WN3QEIKDVA5CNFSM4IK7NQAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4EKTWQ#issuecomment-520661466, or mute the thread https://github.com/notifications/unsubscribe-auth/ALCZMGB4Y2D2ULCAT6UMUA3QEIKDVANCNFSM4IK7NQAA.

kittawei1123 commented 5 years ago

2019-08-13 04:11:03.070+0000 elk-l1 (FileTailer[kinesis:ec2_log:/var/log/wlc.log]) com.amazon.kinesis.streaming.agent.processing.processors.LogToJSONDataConverter [DEBUG] Getting exception while parsing record: [0 Mon Aug 12 01:19:18 2019 Rogue AP: a4:d9:31:55:dc:ac detected on Base Radio MAC: f4:db:e6:96:ec:e0 Interface no: 1(802.11a) Channel: 44 RSSI: -85 SNR: 10 Classification: unclassified, State: Alert, RuleClassified : N, Severity Score: 0, RuleName: N.A. ,Classified AP MAC: 00:00:00:00:00:00 ,Classified RSSI: 0], record will be skipped com.amazon.kinesis.streaming.agent.processing.exceptions.LogParsingException: Invalid log entry given the entry pattern at com.amazon.kinesis.streaming.agent.processing.parsers.SysLogParser.parseLogRecord(Unknown Source) at com.amazon.kinesis.streaming.agent.processing.processors.LogToJSONDataConverter.convert(Unknown Source) at com.amazon.kinesis.streaming.agent.processing.processors.AgentDataConverterChain.convert(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.AbstractParser.convertData(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.AbstractParser.buildRecord(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.AbstractParser.readRecordFromCurrentBuffer(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.AbstractParser.readRecord(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.processRecordsInCurrentFile(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.processRecords(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.runOnce(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.run(Unknown Source) at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:60) at com.google.common.util.concurrent.Callables$3.run(Callables.java:95) at java.lang.Thread.run(Thread.java:748) 2019-08-13 04:11:03.070+0000 elk-l1 (FileTailer[kinesis:ec2_log:/var/log/wlc.log]) com.amazon.kinesis.streaming.agent.tailing.KinesisParser [WARN] 1 record parsed but skipped for processing and delivering

willUrgently commented 5 years ago

Ok. Your input record has 1 not a minimum of 2 after the sqnum.

Using a regex construction tool is not considered the best approach to learning regex, but this one will let you break it down, a group at a time….

Good luck. Stick with PCRE, its closest to Java’s regex.

https://regex101.com/ https://regex101.com/. Do not put PII or site identifying information into the tool, right? That includes your MAC address

Regards

Will Martin

On Aug 13, 2019, at 12:12 AM, kittawei1123 notifications@github.com wrote:

2019-08-13 04:11:03.070+0000 elk-l1 (FileTailer[kinesis:ec2_log:/var/log/wlc.log]) com.amazon.kinesis.streaming.agent.processing.processors.LogToJSONDataConverter [DEBUG] Getting exception while parsing record: [0 Mon Aug 12 01:19:18 2019 Rogue AP: a4:d9:31:55:dc:ac detected on Base Radio MAC: f4:db:e6:96:ec:e0 Interface no: 1(802.11a) Channel: 44 RSSI: -85 SNR: 10 Classification: unclassified, State: Alert, RuleClassified : N, Severity Score: 0, RuleName: N.A. ,Classified AP MAC: 00:00:00:00:00:00 ,Classified RSSI: 0], record will be skipped com.amazon.kinesis.streaming.agent.processing.exceptions.LogParsingException: Invalid log entry given the entry pattern at com.amazon.kinesis.streaming.agent.processing.parsers.SysLogParser.parseLogRecord(Unknown Source) at com.amazon.kinesis.streaming.agent.processing.processors.LogToJSONDataConverter.convert(Unknown Source) at com.amazon.kinesis.streaming.agent.processing.processors.AgentDataConverterChain.convert(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.AbstractParser.convertData(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.AbstractParser.buildRecord(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.AbstractParser.readRecordFromCurrentBuffer(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.AbstractParser.readRecord(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.processRecordsInCurrentFile(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.processRecords(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.runOnce(Unknown Source) at com.amazon.kinesis.streaming.agent.tailing.FileTailer.run(Unknown Source) at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:60) at com.google.common.util.concurrent.Callables$3.run(Callables.java:95) at java.lang.Thread.run(Thread.java:748) 2019-08-13 04:11:03.070+0000 elk-l1 (FileTailer[kinesis:ec2_log:/var/log/wlc.log]) com.amazon.kinesis.streaming.agent.tailing.KinesisParser [WARN] 1 record parsed but skipped for processing and delivering

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/awslabs/amazon-kinesis-agent/issues/180?email_source=notifications&email_token=ALCZMGDPA42O6YMU6STBKOTQEIYBFA5CNFSM4IK7NQAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4EPQXY#issuecomment-520681567, or mute the thread https://github.com/notifications/unsubscribe-auth/ALCZMGELEJLJIDDVJ3BH4ZDQEIYBFANCNFSM4IK7NQAA.

kittawei1123 commented 5 years ago

thanks Will Martin, I did use the regx101 , and it's correct。

kittawei1123 commented 5 years ago
       "matchPattern": "^(\\d+)\\s{1,}(\\w{3} \\w{3} \\d+ \\d{2}:\\d{2}:\\d{2} \\d{4})\\s{2,}(.*)",
       "customFieldNames": [ "seqnum", "timestamp", "message" ]
kittawei1123 commented 5 years ago

fixed, sorry it's not space, it's tab "matchPattern": "^(\d+)\t{1,}(\w{3} \w{3} \d+ \d{2}:\d{2}:\d{2} \d{4})\t{1,}(.*)", thanks Martin, you're the BEST.