Closed kowalczyk-p closed 11 months ago
Pinging @elastic/security-external-integrations (Team:Security-External Integrations)
Winlogbeat version in 8.5.3.
Please test under the latest version. There were some changes relating to event log readers configured with forwarded: true
. Relates https://github.com/elastic/beats/pull/34865.
I do not see this anymore but there is a bigger issue - from time to time I see warnings like below and then all log entries in ForwardedEvents are send to Elasticsearch again.
{"log.level":"warn","@timestamp":"2023-07-14T15:10:45.148+0200","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":510},"message":"WinEventLog[ForwardedEvents] EventHandles returned error The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created.","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-07-14T15:10:45.148+0200","log.logger":"winlogbeat","log.origin":{"file.name":"beater/eventlogger.go","file.line":179},"message":"Read() encountered recoverable error. Reopening handle...","error":{"message":"The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created."},"channel":"ForwardedEvents","ecs.version":"1.6.0"}
This happens in Winlogbeat 8.8.2.
I went through winlogbeat logs again and found one more repeating message, this time informational. Sequence of log messages looks like this:
{"log.level":"warn","@timestamp":"2023-07-18T03:12:11.132+0200","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":510},"message":"WinEventLog[ForwardedEvents] EventHandles returned error The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created.","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-07-18T03:12:11.133+0200","log.logger":"winlogbeat","log.origin":{"file.name":"beater/eventlogger.go","file.line":179},"message":"Read() encountered recoverable error. Reopening handle...","error":{"message":"The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created."},"channel":"ForwardedEvents","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-07-18T03:12:11.133+0200","log.logger":"metric_registry","log.origin":{"file.name":"inputmon/input.go","file.line":60},"message":"unregistering","input_type":"winlog","id":"ForwardedEvents","key":"ForwardedEvents","uuid":"5b40be76-d2f8-4766-bdb4-36c01c32d48f","ecs.version":"1.6.0"}
Using winlogbeat-experimental API cause massive amount of warnings:
{"log.level":"warn","@timestamp":"2023-07-18T08:11:32.731+0200","log.logger":"wineventlog.renderer","log.origin":{"file.name":"wineventlog/renderer.go","file.line":319},"message":"The number of event data parameters doesn't match the number of parameters in the template.","provider":"Microsoft-Windows-Security-Auditing","event_id":4776,"event_parameter_count":3,"template_parameter_count":4,"template_version":0,"event_version":0,"ecs.version":"1.6.0"}
Also errors on logstash occurs, for example:
Could not index event to Elasticsearch. status: 400, reason: failed to parse field [source.ip] of type [ip] in document with id 'sUOgZ4kBRX5xfYl8G0nm'. Preview of field's value: '445'
I found issue #32168 and I was wondering is it related somehow. Also comment https://github.com/elastic/beats/blob/b481143bff40d709a7cd7ade4e78db41a47f4507/winlogbeat/eventlog/wineventlog_experimental.go#L250 suggest that when this error occurs log file will be read from beginning. Is this really correct behavior? In my case for ForwardedEvents this is causing to ingest again 20 Gb of logs.
CC: @efd6
@kowalczyk-p Focusing on the non-experimental API, the behaviour is founded in the way that Windows handles logs. I'm not sure what we can do about it; we could make the log level, INFO, but I think that's a little too low. For the experimental API, I don't have enough experience with that code to be able to comment.
@efd6 so you say that this error and in consequence resending all events from beginning of log file is normal Windows behavior?
Do you see that for the non-experimental case?
Yes, I just tried experimental to see is it better but it is not.
Just reviewed logs, problem started at version 8.5.3. Before I was using 7.15.0 and there was no such issue. OS is Windows Server 2022 Standard. On another server with Windows Server 2019 Standard Winlogbeat is working fine but this server has less events per second. I created a schedule task to restart service every hour, no luck.
Prior to 8.6 we did not attempt to recover from stale subscriptions, so you would not see errors being logged there, you would however have lost event data as the beat progressed past the error. So it's not that the issue wasn't there, but rather that the issue did not show up in our logging.
It would be interesting to see if under heavier load your 2019 server displays the same behviour. Are you able to test that?
Another bit of information that would be helpful is what event log parameters you are using. This can be examined by using the wevtutil
tool.
This is certainly from Winlogbeat 8.5.3, I double checked:
{"log.level":"warn","@timestamp":"2023-07-13T08:41:00.236+0200","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":378},"message":"WinEventLog[ForwardedEvents] EventHandles returned error The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created.","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-07-13T08:41:00.236+0200","log.logger":"winlogbeat","log.origin":{"file.name":"beater/eventlogger.go","file.line":156},"message":"Read() encountered recoverable error. Reopening handle...","error":{"message":"The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created."},"ecs.version":"1.6.0"}
I'm not sure what info do you need from wevtutil
, could you tell me more about what commands I should execute?
> wevtutil gl ForwardedEvents
name: ForwardedEvents
enabled: true
type: Operational
owningPublisher: Microsoft-Windows-EventCollector
isolation: Custom
channelAccess: O:BAG:SYD:(A;;0x2;;;S-1-15-2-1)(A;;0x2;;;S-1-15-3-1024-3153509613-960666767-3724611135-2725662640-12138253-543910227-1950414635-4190290187)(A;;0xf0007;;;SY)(A;;0x7;;;BA)(A;;0x7;;;SO)(A;;0x3;;;IU)(A;;0x3;;;SU)(A;;0x3;;;S-1-5-3)(A;;0x3;;;S-1-5-33)(A;;0x1;;;S-1-5-32-573)
logging:
logFileName: %SystemRoot%\System32\Winevt\Logs\ForwardedEvents.evtx
retention: false
autoBackup: false
maxSize: 19358547968
publishing:
fileMax: 1
> wevtutil gli ForwardedEvents
creationTime: 2021-12-19T18:07:18.7028271Z
lastAccessTime: 2023-07-19T06:30:26.6100697Z
lastWriteTime: 2023-07-19T06:30:26.6100697Z
fileSize: 19358552064
attributes: 32
numberOfLogRecords: 4379072
oldestRecordNumber: 86050006108
The retention and maxSize are the two most interesting details.
Sorry, you are correct, we introduced the stale result recovery in #32176 which went into 8.3 and 7.17. The point remains the same though, it's not that this error wasn't happening previously, just we weren't doing anything about it.
Unfortunately it won’t be easy to test Server 2019 under heavy load and I’m not able to test is quickly. Did Winlogbeat before #32176 also resend all events from ForwardedEvents log file after error? Do you thing that reducing size of log file could help?
Did Winlogbeat before https://github.com/elastic/beats/pull/32176 also resend all events from ForwardedEvents log file after error?
No, it just failed.
Do you thing that reducing size of log file could help?
It's worth a try. The 20GB value you quoted is consistent with the maxSize
value there.
Unfortunately reducing file size to 8GB did not help.
What behaviour would you like to see here? At the moment we are doing what is recommended, but clearly this is not working in your case. We have not see this kind of issue with other users, and the issue is on Windows 2022 which is a version we have had other unusual issues with; I know that you've said that you can't do this, but replication on another version would be very helpful.
Currently we are setting up another server running Windows Server 2019 and we are going to forward logs on it to see if there is any difference. It will take a while because I have to work on it with another department.
On 27.07 8:30 we started forwarding events to new event collector on Windows 2019. From that time only once we got error with winlogbeat on this server:
{"log.level":"warn","@timestamp":"2023-07-31T09:44:49.636+0200","log.logger":"winlogbeat","log.origin":{"file.name":"beater/eventlogger.go","file.line":156},"message":"Read() encountered recoverable error. Reopening handle...","error":{"message":"The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created."},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-07-31T09:44:49.636+0200","log.origin":{"file.name":"eventlog/wineventlog.go","file.line":378},"message":"WinEventLog[ForwardedEvents] EventHandles returned error The query result is stale or invalid and must be recreated. This may be due to the log being cleared or rolling over after the query result was created.","ecs.version":"1.6.0"}
I noticed that aroud thime error occured I have events with same winlog.record_id value in elasticsearch repeated around 120 000 times. Oldest and newest event.ingested values for those events are about Jul 31, 2023 @ 09:33:12.902 and Jul 31, 2023 @ 18:01:19.548.
Those errors are named ERROR_EVT_QUERY_RESULT_STALE
in Windows. I suspect that we need to undo the change relating to https://github.com/elastic/beats/issues/29793 that enabled EVT_SUBSCRIBE_STRICT
for logs that have forwarded: true
.
Hi Team, @efd6 can you confirm if the PR https://github.com/elastic/beats/pull/36309 you provided does solve the issue opened here?
It should have, yes.
@efd6 Could you tell me in which version of Winlogbeat this fix is available?
v8.10.0 and v8.9.2.
I can confirm that upgrade to 8.9.2 resolves this issue. Thank you for your help :)
Winlogbeat on Windows Server 2022 running as Windows Event Collector reports repeatedly below warnings:
Events are delivered to Elasticsearch without message field value but all other field values are ok. This problem occurs for events from two servers only, one running Windows Server 2012 and other one running Windows Server 2019. Events from other servers with same OS are fine.
Winlogbeat version in 8.5.3.