elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
96 stars 4.92k forks source link

[Winlogbeat] Repeated warnings #36020

Closed kowalczyk-p closed 11 months ago

kowalczyk-p commented 1 year ago

Winlogbeat on Windows Server 2022 running as Windows Event Collector reports repeatedly below warnings:

WinEventLog[ForwardedEvents] error salvaging message (event id=140 qualifier=0 provider=\"Microsoft-Windows-TaskScheduler\" created at 2023-07-07 08:36:24.6078924 +0000 UTC will be included without a message): failed in EvtFormatMessage: The locale specific resource for the desired message is not present.
WinEventLog[ForwardedEvents] error salvaging message (event id=1 qualifier=0 provider=\"Microsoft-Windows-Sysmon\" created at 2023-07-07 08:35:43.5481807 +0000 UTC will be included without a message): failed in EvtFormatMessage: The maximum number of replacements has been reached.

Events are delivered to Elasticsearch without message field value but all other field values are ok. This problem occurs for events from two servers only, one running Windows Server 2012 and other one running Windows Server 2019. Events from other servers with same OS are fine.

Winlogbeat version in 8.5.3.

elasticmachine commented 1 year ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

andrewkroh commented 1 year ago

Winlogbeat version in 8.5.3.

Please test under the latest version. There were some changes relating to event log readers configured with forwarded: true. Relates https://github.com/elastic/beats/pull/34865.

kowalczyk-p commented 1 year ago

I do not see this anymore but there is a bigger issue - from time to time I see warnings like below and then all log entries in ForwardedEvents are send to Elasticsearch again.

{"log.level":"warn","@timestamp":"2023-07-14T15:10:45.148+0200","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":510},"message":"WinEventLog[ForwardedEvents] EventHandles returned error The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created.","ecs.version":"1.6.0"}

{"log.level":"warn","@timestamp":"2023-07-14T15:10:45.148+0200","log.logger":"winlogbeat","log.origin":{"file.name":"beater/eventlogger.go","file.line":179},"message":"Read() encountered recoverable error. Reopening handle...","error":{"message":"The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created."},"channel":"ForwardedEvents","ecs.version":"1.6.0"}

This happens in Winlogbeat 8.8.2.

kowalczyk-p commented 1 year ago

I went through winlogbeat logs again and found one more repeating message, this time informational. Sequence of log messages looks like this:

{"log.level":"warn","@timestamp":"2023-07-18T03:12:11.132+0200","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":510},"message":"WinEventLog[ForwardedEvents] EventHandles returned error The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created.","ecs.version":"1.6.0"}

{"log.level":"warn","@timestamp":"2023-07-18T03:12:11.133+0200","log.logger":"winlogbeat","log.origin":{"file.name":"beater/eventlogger.go","file.line":179},"message":"Read() encountered recoverable error. Reopening handle...","error":{"message":"The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created."},"channel":"ForwardedEvents","ecs.version":"1.6.0"}

{"log.level":"info","@timestamp":"2023-07-18T03:12:11.133+0200","log.logger":"metric_registry","log.origin":{"file.name":"inputmon/input.go","file.line":60},"message":"unregistering","input_type":"winlog","id":"ForwardedEvents","key":"ForwardedEvents","uuid":"5b40be76-d2f8-4766-bdb4-36c01c32d48f","ecs.version":"1.6.0"}

Using winlogbeat-experimental API cause massive amount of warnings:

{"log.level":"warn","@timestamp":"2023-07-18T08:11:32.731+0200","log.logger":"wineventlog.renderer","log.origin":{"file.name":"wineventlog/renderer.go","file.line":319},"message":"The number of event data parameters doesn't match the number of parameters in the template.","provider":"Microsoft-Windows-Security-Auditing","event_id":4776,"event_parameter_count":3,"template_parameter_count":4,"template_version":0,"event_version":0,"ecs.version":"1.6.0"}

Also errors on logstash occurs, for example:

Could not index event to Elasticsearch. status: 400, reason: failed to parse field [source.ip] of type [ip] in document with id 'sUOgZ4kBRX5xfYl8G0nm'. Preview of field's value: '445'

I found issue #32168 and I was wondering is it related somehow. Also comment https://github.com/elastic/beats/blob/b481143bff40d709a7cd7ade4e78db41a47f4507/winlogbeat/eventlog/wineventlog_experimental.go#L250 suggest that when this error occurs log file will be read from beginning. Is this really correct behavior? In my case for ForwardedEvents this is causing to ingest again 20 Gb of logs.

CC: @efd6

efd6 commented 1 year ago

@kowalczyk-p Focusing on the non-experimental API, the behaviour is founded in the way that Windows handles logs. I'm not sure what we can do about it; we could make the log level, INFO, but I think that's a little too low. For the experimental API, I don't have enough experience with that code to be able to comment.

kowalczyk-p commented 1 year ago

@efd6 so you say that this error and in consequence resending all events from beginning of log file is normal Windows behavior?

efd6 commented 1 year ago

Do you see that for the non-experimental case?

kowalczyk-p commented 1 year ago

Yes, I just tried experimental to see is it better but it is not.

kowalczyk-p commented 1 year ago

Just reviewed logs, problem started at version 8.5.3. Before I was using 7.15.0 and there was no such issue. OS is Windows Server 2022 Standard. On another server with Windows Server 2019 Standard Winlogbeat is working fine but this server has less events per second. I created a schedule task to restart service every hour, no luck.

efd6 commented 1 year ago

Prior to 8.6 we did not attempt to recover from stale subscriptions, so you would not see errors being logged there, you would however have lost event data as the beat progressed past the error. So it's not that the issue wasn't there, but rather that the issue did not show up in our logging.

It would be interesting to see if under heavier load your 2019 server displays the same behviour. Are you able to test that?

Another bit of information that would be helpful is what event log parameters you are using. This can be examined by using the wevtutil tool.

kowalczyk-p commented 1 year ago

This is certainly from Winlogbeat 8.5.3, I double checked:

{"log.level":"warn","@timestamp":"2023-07-13T08:41:00.236+0200","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":378},"message":"WinEventLog[ForwardedEvents] EventHandles returned error The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created.","ecs.version":"1.6.0"}

{"log.level":"warn","@timestamp":"2023-07-13T08:41:00.236+0200","log.logger":"winlogbeat","log.origin":{"file.name":"beater/eventlogger.go","file.line":156},"message":"Read() encountered recoverable error. Reopening handle...","error":{"message":"The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created."},"ecs.version":"1.6.0"}

I'm not sure what info do you need from wevtutil, could you tell me more about what commands I should execute?

> wevtutil gl ForwardedEvents
name: ForwardedEvents
enabled: true
type: Operational
owningPublisher: Microsoft-Windows-EventCollector
isolation: Custom
channelAccess: O:BAG:SYD:(A;;0x2;;;S-1-15-2-1)(A;;0x2;;;S-1-15-3-1024-3153509613-960666767-3724611135-2725662640-12138253-543910227-1950414635-4190290187)(A;;0xf0007;;;SY)(A;;0x7;;;BA)(A;;0x7;;;SO)(A;;0x3;;;IU)(A;;0x3;;;SU)(A;;0x3;;;S-1-5-3)(A;;0x3;;;S-1-5-33)(A;;0x1;;;S-1-5-32-573)
logging:
  logFileName: %SystemRoot%\System32\Winevt\Logs\ForwardedEvents.evtx
  retention: false
  autoBackup: false
  maxSize: 19358547968
publishing:
  fileMax: 1
> wevtutil gli ForwardedEvents
creationTime: 2021-12-19T18:07:18.7028271Z
lastAccessTime: 2023-07-19T06:30:26.6100697Z
lastWriteTime: 2023-07-19T06:30:26.6100697Z
fileSize: 19358552064
attributes: 32
numberOfLogRecords: 4379072
oldestRecordNumber: 86050006108
efd6 commented 1 year ago

The retention and maxSize are the two most interesting details.

Sorry, you are correct, we introduced the stale result recovery in #32176 which went into 8.3 and 7.17. The point remains the same though, it's not that this error wasn't happening previously, just we weren't doing anything about it.

kowalczyk-p commented 1 year ago

Unfortunately it won’t be easy to test Server 2019 under heavy load and I’m not able to test is quickly. Did Winlogbeat before #32176 also resend all events from ForwardedEvents log file after error? Do you thing that reducing size of log file could help?

efd6 commented 1 year ago

Did Winlogbeat before https://github.com/elastic/beats/pull/32176 also resend all events from ForwardedEvents log file after error?

No, it just failed.

Do you thing that reducing size of log file could help?

It's worth a try. The 20GB value you quoted is consistent with the maxSize value there.

kowalczyk-p commented 1 year ago

Unfortunately reducing file size to 8GB did not help.

efd6 commented 1 year ago

What behaviour would you like to see here? At the moment we are doing what is recommended, but clearly this is not working in your case. We have not see this kind of issue with other users, and the issue is on Windows 2022 which is a version we have had other unusual issues with; I know that you've said that you can't do this, but replication on another version would be very helpful.

kowalczyk-p commented 1 year ago

Currently we are setting up another server running Windows Server 2019 and we are going to forward logs on it to see if there is any difference. It will take a while because I have to work on it with another department.

kowalczyk-p commented 1 year ago

On 27.07 8:30 we started forwarding events to new event collector on Windows 2019. From that time only once we got error with winlogbeat on this server:

{"log.level":"warn","@timestamp":"2023-07-31T09:44:49.636+0200","log.logger":"winlogbeat","log.origin":{"file.name":"beater/eventlogger.go","file.line":156},"message":"Read() encountered recoverable error. Reopening handle...","error":{"message":"The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created."},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-07-31T09:44:49.636+0200","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":378},"message":"WinEventLog[ForwardedEvents] EventHandles returned error The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created.","ecs.version":"1.6.0"}

I noticed that aroud thime error occured I have events with same winlog.record_id value in elasticsearch repeated around 120 000 times. Oldest and newest event.ingested values for those events are about Jul 31, 2023 @ 09:33:12.902 and Jul 31, 2023 @ 18:01:19.548.

andrewkroh commented 1 year ago

Those errors are named ERROR_EVT_QUERY_RESULT_STALE in Windows. I suspect that we need to undo the change relating to https://github.com/elastic/beats/issues/29793 that enabled EVT_SUBSCRIBE_STRICT for logs that have forwarded: true.

essodjolo commented 1 year ago

Hi Team, @efd6 can you confirm if the PR https://github.com/elastic/beats/pull/36309 you provided does solve the issue opened here?

efd6 commented 1 year ago

It should have, yes.

kowalczyk-p commented 11 months ago

@efd6 Could you tell me in which version of Winlogbeat this fix is available?

efd6 commented 11 months ago

v8.10.0 and v8.9.2.

kowalczyk-p commented 11 months ago

I can confirm that upgrade to 8.9.2 resolves this issue. Thank you for your help :)