elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.15k stars 4.91k forks source link

[winlogbeat] Add a truncate option for windows event logs #16591

Closed philippkahr closed 3 years ago

philippkahr commented 4 years ago

Describe the enhancement: Hi,

basically there are a lot of Windows Event Log entries that are unnecessarily long and do take away a lot of storage.

Let's take the following event log entry from the security event log:

A Kerberos service ticket was requested.

Account Information:
    Account Name:       e_pkah@my.super.cool.domain.com
    Account Domain:     my.super.cool.domain.com
    Logon GUID:     {0B0A0E43-6415-AE46-8987-5316DD8FA746}

Service Information:
    Service Name:       myfancypc$
    Service ID:     S-1-5-21-3544562028-792812758-4257637587-779563

Network Information:
    Client Address:     ::ffff:10.0.0.1
    Client Port:        62192

Additional Information:
    Ticket Options:     0x40810000
    Ticket Encryption Type: 0x12
    Failure Code:       0x0
    Transited Services: -

This event is generated every time access is requested to a resource such as a computer or a Windows service.  The service name indicates the resource to which access was requested.

This event can be correlated with Windows logon events by comparing the Logon GUID fields in each event.  The logon event occurs on the machine that was accessed, which is often a different machine than the domain controller which issued the service ticket.

Ticket options, encryption types, and failure codes are defined in RFC 4120.

We could easily add an option in the winlogbeat config that is:

truncate: true

which should strip away the entire unnecessary body at the end.

This event is generated every time access is requested to a resource such as a computer or a Windows service.  The service name indicates the resource to which access was requested.

This event can be correlated with Windows logon events by comparing the Logon GUID fields in each event.  The logon event occurs on the machine that was accessed, which is often a different machine than the domain controller which issued the service ticket.

Ticket options, encryption types, and failure codes are defined in RFC 4120.

That block is, I guess around 1kb of storage. I do have around 15million of such event logs per day, which would be ~15gb of wasted storage. I do not know if using best_compression helps here, but still, in my point of view, it is wasted storage.

@andrewkroh I am going to ping you here since you are the guy regarding winlogbeat. If you would approve this issue, I would like to get started on the implementation of some security event logs. I would add it to that here: Winlogbeat security js or do you have another suggestion?

Describe a specific use case for the enhancement or feature:

elasticmachine commented 4 years ago

Pinging @elastic/siem (Team:SIEM)

andrewkroh commented 4 years ago

It's an interesting idea. Before making any changes to Winlogbeat I recommend testing the truncation algorithm against all the messages by using the provider metadata. Like take a look at the messages from wevtutil gp Microsoft-Windows-Security-Auditing /ge /gm:true (I think that's the right command) and run a test against those messages. You could share that output here for people to evaluate.

Here are some alternatives I can think of that make help with the wasted space:

  1. Populate the message (or some keyword field) with the original parameterized message string. Then it would be the same in all events for a given event ID. This would likely compress really well since there is a relative small number of unique event IDs. For example have the message be either the original Service %1 has stopped. or replace %1 with the name of the associated event_data parameter name to get Service {{ServiceName}} has stopped.

  2. Add an option to omit the message field completely. Getting the rendered message from the Windows API is a relatively slow process anyways. If the events are well categorized with event.type/event.category by some module and have good event.action values the need on the message becomes less.

philippkahr commented 4 years ago

Hi @andrewkroh thanks for answering!

Well, hmm, I would not go as far as saying it is an algorithm. I would just have a stupid

message.replace('This event is generated every time access is requested to a resource such as a computer or a Windows service.  The service name indicates the resource to which access was requested.

This event can be correlated with Windows logon events by comparing the Logon GUID fields in each event.  The logon event occurs on the machine that was accessed, which is often a different machine than the domain controller which issued the service ticket.

Ticket options, encryption types, and failure codes are defined in RFC 4120.','')

I would have defined that for the most important and most occurring event.ids such as 5145, 4776, 4624, ...

The screenshot of the last seven days from my cluster (to get the priority) kiban12 For the sake of this issue, I just limited myself to extract everything that has a million entries at least.

EventID 5145

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 22:00:40
Event ID:      5145
Task Category: Detailed File Share
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      mydomaincontroller.myfancy.domain.com
Description:
A network share object was checked to see whether client can be granted desired access.

Subject:
    Security ID:        myfancy.domain.com\mynotebook$
    Account Name:       mynotebook$
    Account Domain:     myfancy.domain.com
    Logon ID:       0x156AD4F84

Network Information:    
    Object Type:        File
    Source Address:     10.23.33.13
    Source Port:        60727

Share Information:
    Share Name:     \\*\SYSVOL
    Share Path:     \??\C:\Windows\SYSVOL\sysvol
    Relative Target Name:   myfancy.domain.com\Policies\{DCA2FC0B-7677-4B25-9751-56B30B6473FD}\Machine\Microsoft\Windows NT\SecEdit

Access Request Information:
    Access Mask:        0x100081
    Accesses:       SYNCHRONIZE
                ReadData (or ListDirectory)
                ReadAttributes

Access Check Results:
    SYNCHRONIZE:    Granted by  D:(A;;0x1200a9;;;WD)
                ReadData (or ListDirectory):    Granted by  D:(A;;0x1200a9;;;WD)
                ReadAttributes: Granted by  D:(A;;0x1200a9;;;WD)

Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>5145</EventID>
    <Version>0</Version>
    <Level>0</Level>
    <Task>12811</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T21:00:40.832789200Z" />
    <EventRecordID>346299910</EventRecordID>
    <Correlation />
    <Execution ProcessID="516" ThreadID="524" />
    <Channel>Security</Channel>
    <Computer>mydomaincontroller.domain.com</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="SubjectUserSid">S-1-5-21-3544562028-792812758-4257637587-65787</Data>
    <Data Name="SubjectUserName">mynotebook$</Data>
    <Data Name="SubjectDomainName">myfancy.domain.com</Data>
    <Data Name="SubjectLogonId">0x156ad4f84</Data>
    <Data Name="ObjectType">File</Data>
    <Data Name="IpAddress">10.23.33.13</Data>
    <Data Name="IpPort">60727</Data>
    <Data Name="ShareName">\\*\SYSVOL</Data>
    <Data Name="ShareLocalPath">\??\C:\Windows\SYSVOL\sysvol</Data>
    <Data Name="RelativeTargetName">myfancy.domain.com\Policies\{DCA2FC0B-97BB-4B25-9751-56B30B6473FD}\Machine\Microsoft\Windows NT\SecEdit</Data>
    <Data Name="AccessMask">0x100081</Data>
    <Data Name="AccessList">%%1541
                %%4416
                %%4423
                </Data>
    <Data Name="AccessReason">%%1541:   %%1801  D:(A;;0x1200a9;;;WD)
                %%4416: %%1801  D:(A;;0x1200a9;;;WD)
                %%4423: %%1801  D:(A;;0x1200a9;;;WD)
                </Data>
  </EventData>
</Event>

So if I take that as an example, I think there is not much room to improve upon.

EventID 4776

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 22:00:40
Event ID:      4776
Task Category: Credential Validation
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      mydomaincontroller.myfancy.domain.com
Description:
The computer attempted to validate the credentials for an account.

Authentication Package: MICROSOFT_AUTHENTICATION_PACKAGE_V1_0
Logon Account:  mynotebook$
Source Workstation: mynotebook
Error Code: 0x0
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>4776</EventID>
    <Version>0</Version>
    <Level>0</Level>
    <Task>14336</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T21:00:40.753577800Z" />
    <EventRecordID>346299902</EventRecordID>
    <Correlation />
    <Execution ProcessID="516" ThreadID="1368" />
    <Channel>Security</Channel>
    <Computer>mydomaincontroller.myfancy.domain.com</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="PackageName">MICROSOFT_AUTHENTICATION_PACKAGE_V1_0</Data>
    <Data Name="TargetUserName">mynotebook$</Data>
    <Data Name="Workstation">mynotebook</Data>
    <Data Name="Status">0x0</Data>
  </EventData>
</Event>

once again, nothing to really improve. I guess the entire best_compression is really taking care of it already, since TargetUserName, workstation, PackageName often has the same value. I am not too deep into the compression algorithm and how they work.

EventID 4624

This is one of the events that I would like to truncate

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 22:00:40
Event ID:      4624
Task Category: Logon
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:       mydomaincontroller.myfancy.domain.com
Description:
An account was successfully logged on.

Subject:
    Security ID:        NULL SID
    Account Name:       -
    Account Domain:     -
    Logon ID:       0x0

Logon Type:         3

Impersonation Level:        Impersonation

New Logon:
    Security ID:        myfancy.domain.com\mynotebook$
    Account Name:       mynotebook$
    Account Domain:     myfancy.domain.com
    Logon ID:       0x156B224BA
    Logon GUID:     {6b447d0f-0f86-bb73-9374-52b36518e6f7}

Process Information:
    Process ID:     0x0
    Process Name:       -

Network Information:
    Workstation Name:   -
    Source Network Address: 10.10.10.10
    Source Port:        53069

Detailed Authentication Information:
    Logon Process:      Kerberos
    Authentication Package: Kerberos
    Transited Services: -
    Package Name (NTLM only):   -
    Key Length:     0

This event is generated when a logon session is created. It is generated on the computer that was accessed.

The subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.

The logon type field indicates the kind of logon that occurred. The most common types are 2 (interactive) and 3 (network).

The New Logon fields indicate the account for whom the new logon was created, i.e. the account that was logged on.

The network fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.

The impersonation level field indicates the extent to which a process in the logon session can impersonate.

The authentication information fields provide detailed information about this specific logon request.
    - Logon GUID is a unique identifier that can be used to correlate this event with a KDC event.
    - Transited services indicate which intermediate services have participated in this logon request.
    - Package name indicates which sub-protocol was used among the NTLM protocols.
    - Key length indicates the length of the generated session key. This will be 0 if no session key was requested.
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>4624</EventID>
    <Version>1</Version>
    <Level>0</Level>
    <Task>12544</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T21:00:40.769205100Z" />
    <EventRecordID>346299904</EventRecordID>
    <Correlation />
    <Execution ProcessID="516" ThreadID="540" />
    <Channel>Security</Channel>
    <Computer>mydomaincontroller.myfancy.domain.com</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="SubjectUserSid">S-1-0-0</Data>
    <Data Name="SubjectUserName">-</Data>
    <Data Name="SubjectDomainName">-</Data>
    <Data Name="SubjectLogonId">0x0</Data>
    <Data Name="TargetUserSid">S-1-5-21-3544562028-792812758-4257637587-67474</Data>
    <Data Name="TargetUserName">mynotebook$</Data>
    <Data Name="TargetDomainName">myfancy.domain.com</Data>
    <Data Name="TargetLogonId">0x156b224ba</Data>
    <Data Name="LogonType">3</Data>
    <Data Name="LogonProcessName">Kerberos</Data>
    <Data Name="AuthenticationPackageName">Kerberos</Data>
    <Data Name="WorkstationName">-</Data>
    <Data Name="LogonGuid">{6B447D0F-0F86-A37A-0B61-52B36518E6F7}</Data>
    <Data Name="TransmittedServices">-</Data>
    <Data Name="LmPackageName">-</Data>
    <Data Name="KeyLength">0</Data>
    <Data Name="ProcessId">0x0</Data>
    <Data Name="ProcessName">-</Data>
    <Data Name="IpAddress">10.10.10.10</Data>
    <Data Name="IpPort">53069</Data>
    <Data Name="ImpersonationLevel">%%1833</Data>
  </EventData>
</Event>

I would like to strip away this entire block:

This event is generated when a logon session is created. It is generated on the computer that was accessed.

The subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.

The logon type field indicates the kind of logon that occurred. The most common types are 2 (interactive) and 3 (network).

The New Logon fields indicate the account for whom the new logon was created, i.e. the account that was logged on.

The network fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.

The impersonation level field indicates the extent to which a process in the logon session can impersonate.

The authentication information fields provide detailed information about this specific logon request.
    - Logon GUID is a unique identifier that can be used to correlate this event with a KDC event.
    - Transited services indicate which intermediate services have participated in this logon request.
    - Package name indicates which sub-protocol was used among the NTLM protocols.
    - Key length indicates the length of the generated session key. This will be 0 if no session key was requested.

I guess this text is around 2kb and that just wasted space.

EventID 4634

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 22:00:40
Event ID:      4634
Task Category: Logoff
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:       domaincontroller
Description:
An account was logged off.

Subject:
    Security ID:        domain\mynotebook$
    Account Name:       mynotebook$
    Account Domain:     domain
    Logon ID:       0x156B20C72

Logon Type:         3

This event is generated when a logon session is destroyed. It may be positively correlated with a logon event using the Logon ID value. Logon IDs are only unique between reboots on the same computer.
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>4634</EventID>
    <Version>0</Version>
    <Level>0</Level>
    <Task>12545</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T21:00:40.073399000Z" />
    <EventRecordID>346299844</EventRecordID>
    <Correlation />
    <Execution ProcessID="516" ThreadID="3436" />
    <Channel>Security</Channel>
    <Computer>domaincontroller</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="TargetUserSid">S-1-5-21-3544562028-792812758-4257637587-66736</Data>
    <Data Name="TargetUserName">mynotebook$</Data>
    <Data Name="TargetDomainName">domain</Data>
    <Data Name="TargetLogonId">0x156b20c72</Data>
    <Data Name="LogonType">3</Data>
  </EventData>
</Event>

I think there is something we can get rid of.

This event is generated when a logon session is destroyed. It may be positively correlated with a logon event using the Logon ID value. Logon IDs are only unique between reboots on the same computer.

EventID 4769

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 22:00:38
Event ID:      4769
Task Category: Kerberos Service Ticket Operations
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      domaincontroller
Description:
A Kerberos service ticket was requested.

Account Information:
    Account Name:       philipp.kahr@icloud.com
    Account Domain:     icloud.com
    Logon GUID:     {3d5b795b-5993-aaad-ddd4-d5d33fd306f3}

Service Information:
    Service Name:       github$
    Service ID:     icloud.com\github$

Network Information:
    Client Address:     ::ffff:10.10.33.33
    Client Port:        61331

Additional Information:
    Ticket Options:     0x40810000
    Ticket Encryption Type: 0x12
    Failure Code:       0x0
    Transited Services: -

This event is generated every time access is requested to a resource such as a computer or a Windows service.  The service name indicates the resource to which access was requested.

This event can be correlated with Windows logon events by comparing the Logon GUID fields in each event.  The logon event occurs on the machine that was accessed, which is often a different machine than the domain controller which issued the service ticket.

Ticket options, encryption types, and failure codes are defined in RFC 4120.
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>4769</EventID>
    <Version>0</Version>
    <Level>0</Level>
    <Task>14337</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T21:00:38.262902600Z" />
    <EventRecordID>346299636</EventRecordID>
    <Correlation />
    <Execution ProcessID="516" ThreadID="1368" />
    <Channel>Security</Channel>
    <Computer>domaincontroller</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="TargetUserName">philipp.kahr@icloud.com</Data>
    <Data Name="TargetDomainName">icloud.com</Data>
    <Data Name="ServiceName">github$</Data>
    <Data Name="ServiceSid">S-1-5-21-3544562028-792812758-4257637587-56565</Data>
    <Data Name="TicketOptions">0x40810000</Data>
    <Data Name="TicketEncryptionType">0x12</Data>
    <Data Name="IpAddress">::ffff:10.10.33.33</Data>
    <Data Name="IpPort">61331</Data>
    <Data Name="Status">0x0</Data>
    <Data Name="LogonGuid">{3D5B795B-5993-aaad-ddd4-D5D33FD306F3}</Data>
    <Data Name="TransmittedServices">-</Data>
  </EventData>
</Event>

This could be improved and stripped away from:

This event is generated every time access is requested to a resource such as a computer or a Windows service.  The service name indicates the resource to which access was requested.

This event can be correlated with Windows logon events by comparing the Logon GUID fields in each event.  The logon event occurs on the machine that was accessed, which is often a different machine than the domain controller which issued the service ticket.

Ticket options, encryption types, and failure codes are defined in RFC 4120.

EventID 4688

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 22:00:35
Event ID:      4688
Task Category: Process Creation
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      domaincontroller
Description:
A new process has been created.

Creator Subject:
    Security ID:        SYSTEM
    Account Name:       domaincontroller$
    Account Domain:     domain
    Logon ID:       0x3E7

Target Subject:
    Security ID:        NULL SID
    Account Name:       -
    Account Domain:     -
    Logon ID:       0x0

Process Information:
    New Process ID:     0x1b84
    New Process Name:   C:\Windows\System32\conhost.exe
    Token Elevation Type:   TokenElevationTypeDefault (1)
    Creator Process ID: 0x1d18
    Process Command Line:   

Token Elevation Type indicates the type of token that was assigned to the new process in accordance with User Account Control policy.

Type 1 is a full token with no privileges removed or groups disabled.  A full token is only used if User Account Control is disabled or if the user is the built-in Administrator account or a service account.

Type 2 is an elevated token with no privileges removed or groups disabled.  An elevated token is used when User Account Control is enabled and the user chooses to start the program using Run as administrator.  An elevated token is also used when an application is configured to always require administrative privilege or to always require maximum privilege, and the user is a member of the Administrators group.

Type 3 is a limited token with administrative privileges removed and administrative groups disabled.  The limited token is used when User Account Control is enabled, the application does not require administrative privilege, and the user does not choose to start the program using Run as administrator.
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>4688</EventID>
    <Version>2</Version>
    <Level>0</Level>
    <Task>13312</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T21:00:35.174983000Z" />
    <EventRecordID>346299192</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="5112" />
    <Channel>Security</Channel>
    <Computer>domaincontroller</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="SubjectUserSid">S-1-5-18</Data>
    <Data Name="SubjectUserName">domaincontroller$</Data>
    <Data Name="SubjectDomainName">domain</Data>
    <Data Name="SubjectLogonId">0x3e7</Data>
    <Data Name="NewProcessId">0x1b84</Data>
    <Data Name="NewProcessName">C:\Windows\System32\conhost.exe</Data>
    <Data Name="TokenElevationType">%%1936</Data>
    <Data Name="ProcessId">0x1d18</Data>
    <Data Name="CommandLine">
    </Data>
    <Data Name="TargetUserSid">S-1-0-0</Data>
    <Data Name="TargetUserName">-</Data>
    <Data Name="TargetDomainName">-</Data>
    <Data Name="TargetLogonId">0x0</Data>
  </EventData>
</Event>

Again this could be improved

Token Elevation Type indicates the type of token that was assigned to the new process in accordance with User Account Control policy.

Type 1 is a full token with no privileges removed or groups disabled.  A full token is only used if User Account Control is disabled or if the user is the built-in Administrator account or a service account.

Type 2 is an elevated token with no privileges removed or groups disabled.  An elevated token is used when User Account Control is enabled and the user chooses to start the program using Run as administrator.  An elevated token is also used when an application is configured to always require administrative privilege or to always require maximum privilege, and the user is a member of the Administrators group.

Type 3 is a limited token with administrative privileges removed and administrative groups disabled.  The limited token is used when User Account Control is enabled, the application does not require administrative privilege, and the user does not choose to start the program using Run as administrator.

EventID 4689

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 22:00:30
Event ID:      4689
Task Category: Process Termination
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      domaincontroller
Description:
A process has exited.

Subject:
    Security ID:        SYSTEM
    Account Name:       domaincontroller$
    Account Domain:     domain
    Logon ID:       0x3E7

Process Information:
    Process ID: 0x148c
    Process Name:   C:\Windows\System32\conhost.exe
    Exit Status:    0x0
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>4689</EventID>
    <Version>0</Version>
    <Level>0</Level>
    <Task>13313</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T21:00:30.168308700Z" />
    <EventRecordID>346298474</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="7632" />
    <Channel>Security</Channel>
    <Computer>domaincontroller</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="SubjectUserSid">S-1-5-18</Data>
    <Data Name="SubjectUserName">domaincontroller$</Data>
    <Data Name="SubjectDomainName">domain</Data>
    <Data Name="SubjectLogonId">0x3e7</Data>
    <Data Name="Status">0x0</Data>
    <Data Name="ProcessId">0x148c</Data>
    <Data Name="ProcessName">C:\Windows\System32\conhost.exe</Data>
  </EventData>
</Event>

EventID 5140

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 22:00:27
Event ID:      5140
Task Category: File Share
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      domaincontroller
Description:
A network share object was accessed.

Subject:
    Security ID:        domain\mynotebook$
    Account Name:       mynotebook$
    Account Domain:     domain
    Logon ID:       0x156AF3A53

Network Information:    
    Object Type:        File
    Source Address:     10.10.30.33
    Source Port:        52485

Share Information:
    Share Name:     \\*\IPC$
    Share Path:     

Access Request Information:
    Access Mask:        0x1
    Accesses:       ReadData (or ListDirectory)
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>5140</EventID>
    <Version>1</Version>
    <Level>0</Level>
    <Task>12808</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T21:00:27.223925800Z" />
    <EventRecordID>346297977</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="7444" />
    <Channel>Security</Channel>
    <Computer>domaincontroller</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="SubjectUserSid">S-1-5-21-3544562028-792812758-4257637587-33334</Data>
    <Data Name="SubjectUserName">mynotebook$</Data>
    <Data Name="SubjectDomainName">domain</Data>
    <Data Name="SubjectLogonId">0x156af3a53</Data>
    <Data Name="ObjectType">File</Data>
    <Data Name="IpAddress">10.10.30.33</Data>
    <Data Name="IpPort">52485</Data>
    <Data Name="ShareName">\\*\IPC$</Data>
    <Data Name="ShareLocalPath">
    </Data>
    <Data Name="AccessMask">0x1</Data>
    <Data Name="AccessList">%%4416
                </Data>
  </EventData>
</Event>

EventID 5136

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 22:00:03
Event ID:      5136
Task Category: Directory Service Changes
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      domaincontroller
Description:
A directory service object was modified.

Subject:
    Security ID:        domain\philipp
    Account Name:       philipp
    Account Domain:     domain
    Logon ID:       0x156ACA113

Directory Service:
    Name:   domain.com
    Type:   Active Directory Domain Services

Object:
    DN: CN=0000000000000006,CN=W0222223A22222222222222222222222222222222222222222,CN=enatelSSOStorageV3,CN=philipp,OU=users,DC=domain,DC=com
    GUID:   CN=0000000000000006,CN=W0222223A22222222222222222222222222222222222222222,CN=enatelSSOStorageV3,CN=philipp,OU=users,DC=domain,DC=com
    Class:  enatelSSOAccountParameter

Attribute:
    LDAP Display Name:  enatelSSOAccountParameterValue
    Syntax (OID):   2.5.5.10
    Value:  <Binary>

Operation:
    Type:   Value Added
    Correlation ID: {28cda6ac-bd3c-44db-aadb-58dc2328871a}
    Application Correlation ID: -
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>5136</EventID>
    <Version>0</Version>
    <Level>0</Level>
    <Task>14081</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T21:00:03.479158200Z" />
    <EventRecordID>346295533</EventRecordID>
    <Correlation />
    <Execution ProcessID="516" ThreadID="3436" />
    <Channel>Security</Channel>
    <Computer>domaincontroller</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="OpCorrelationID">{28CDA6AC-BD3C-47C3-85B1-58DC2328871A}</Data>
    <Data Name="AppCorrelationID">-</Data>
    <Data Name="SubjectUserSid">S-1-5-21-3544562028-792812758-4257637587-55444</Data>
    <Data Name="SubjectUserName">philipp</Data>
    <Data Name="SubjectDomainName">domain</Data>
    <Data Name="SubjectLogonId">0x156aca113</Data>
    <Data Name="DSName">domain.com</Data>
    <Data Name="DSType">%%14676</Data>
    <Data Name="ObjectDN">CN=0000000000000006,CN=W0222223A22222222222222222222222222222222222222222,CN=enatelSSOStorageV3,CN=philipp,OU=users,DC=domain,DC=com</Data>
    <Data Name="ObjectGUID">{7245A6C0-BBBD-AA45-877C-A7A722FA355E}</Data>
    <Data Name="ObjectClass">enatelSSOAccountParameter</Data>
    <Data Name="AttributeLDAPDisplayName">enatelSSOAccountParameterValue</Data>
    <Data Name="AttributeSyntaxOID">2.5.5.10</Data>
    <Data Name="AttributeValue">%%14672</Data>
    <Data Name="OperationType">%%14674</Data>
  </EventData>
</Event>

EventID 4648

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 21:59:58
Event ID:      4648
Task Category: Logon
Level:         Information
Keywords:      Audit Success
User:          N/A
Computer:      domaincontroller
Description:
A logon was attempted using explicit credentials.

Subject:
    Security ID:        SYSTEM
    Account Name:       domaincontroller$
    Account Domain:     domain
    Logon ID:       0x3E7
    Logon GUID:     {00000000-0000-0000-0000-000000000000}

Account Whose Credentials Were Used:
    Account Name:       my-service-user
    Account Domain:     domain
    Logon GUID:     {00000000-0000-0000-0000-000000000000}

Target Server:
    Target Server Name: localhost
    Additional Information: localhost

Process Information:
    Process ID:     0x204
    Process Name:       C:\Windows\System32\lsass.exe

Network Information:
    Network Address:    10.10.30.33
    Port:           44906

This event is generated when a process attempts to log on an account by explicitly specifying that account’s credentials.  This most commonly occurs in batch-type configurations such as scheduled tasks, or when using the RUNAS command.
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>4648</EventID>
    <Version>0</Version>
    <Level>0</Level>
    <Task>12544</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T20:59:58.638219900Z" />
    <EventRecordID>346294771</EventRecordID>
    <Correlation />
    <Execution ProcessID="516" ThreadID="7988" />
    <Channel>Security</Channel>
    <Computer>domaincontroller</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="SubjectUserSid">S-1-5-18</Data>
    <Data Name="SubjectUserName">domaincontroller$</Data>
    <Data Name="SubjectDomainName">domain</Data>
    <Data Name="SubjectLogonId">0x3e7</Data>
    <Data Name="LogonGuid">{00000000-0000-0000-0000-000000000000}</Data>
    <Data Name="TargetUserName">my-service-user</Data>
    <Data Name="TargetDomainName">domain</Data>
    <Data Name="TargetLogonGuid">{00000000-0000-0000-0000-000000000000}</Data>
    <Data Name="TargetServerName">localhost</Data>
    <Data Name="TargetInfo">localhost</Data>
    <Data Name="ProcessId">0x204</Data>
    <Data Name="ProcessName">C:\Windows\System32\lsass.exe</Data>
    <Data Name="IpAddress">10.10.30.33</Data>
    <Data Name="IpPort">44906</Data>
  </EventData>
</Event>

There is something to remove

This event is generated when a process attempts to log on an account by explicitly specifying that account’s credentials.  This most commonly occurs in batch-type configurations such as scheduled tasks, or when using the RUNAS command.

EventID 4771

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          26.02.2020 22:50:02
Event ID:      4771
Task Category: Kerberos Authentication Service
Level:         Information
Keywords:      Audit Failure
User:          N/A
Computer:      domaincontroller
Description:
Kerberos pre-authentication failed.

Account Information:
    Security ID:        domain\philipp
    Account Name:       philipp

Service Information:
    Service Name:       krbtgt/domain.AMSIAG.COM

Network Information:
    Client Address:     ::ffff:10.10.30.33
    Client Port:        64671

Additional Information:
    Ticket Options:     0x40810010
    Failure Code:       0x18
    Pre-Authentication Type:    2

Certificate Information:
    Certificate Issuer Name:        
    Certificate Serial Number:  
    Certificate Thumbprint:     

Certificate information is only provided if a certificate was used for pre-authentication.

Pre-authentication types, ticket options and failure codes are defined in RFC 4120.

If the ticket was malformed or damaged during transit and could not be decrypted, then many fields in this event might not be present.
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>4771</EventID>
    <Version>0</Version>
    <Level>0</Level>
    <Task>14339</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8010000000000000</Keywords>
    <TimeCreated SystemTime="2020-02-26T21:50:02.025033900Z" />
    <EventRecordID>346501852</EventRecordID>
    <Correlation />
    <Execution ProcessID="516" ThreadID="1368" />
    <Channel>Security</Channel>
    <Computer>domaincontroller</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="TargetUserName">philipp</Data>
    <Data Name="TargetSid">S-1-5-21-3544562028-792812758-4257637587-44553</Data>
    <Data Name="ServiceName">krbtgt/domain.AMSIAG.COM</Data>
    <Data Name="TicketOptions">0x40810010</Data>
    <Data Name="Status">0x18</Data>
    <Data Name="PreAuthType">2</Data>
    <Data Name="IpAddress">::ffff:10.10.30.33</Data>
    <Data Name="IpPort">64671</Data>
    <Data Name="CertIssuerName">
    </Data>
    <Data Name="CertSerialNumber">
    </Data>
    <Data Name="CertThumbprint">
    </Data>
  </EventData>
</Event>

We can remove something

Certificate information is only provided if a certificate was used for pre-authentication.

Pre-authentication types, ticket options and failure codes are defined in RFC 4120.

If the ticket was malformed or damaged during transit and could not be decrypted, then many fields in this event might not be present.

Your ideas

Add an option to omit the message field completely. Getting the rendered message from the Windows API is a relatively slow process anyways. If the events are well categorized with event.type/event.category by some module and have good event.action values the need on the message becomes less.

I do not feel that winlogbeat is slow in collecting all the windows event logs I produce. As you can see on the screenshot above I usually have around 2 million windows event logs collected within 30 minutes. Hmm, to be honest, I do not like that idea. I and my coworkers like the rendered message, I even increased the preview length in Kibana to display a message field properly. Maybe that is just a dumb way to work, but it's just how we all are used to. Maybe you have other experiences or can dig into that? Maybe we could do a scripted field within Kibana that renders the messages from the various portions into that nice message block? I guess that could be worth a try.

Screenshot 2020-02-26 at 22 12 51

Populate the message (or some keyword field) with the original parameterized message string. Then it would be the same in all events for a given event ID. This would likely compress really well since there is a relative small number of unique event IDs. For example have the message be either the original Service %1 has stopped. or replace %1 with the name of the associated event_data parameter name to get Service {{ServiceName}} has stopped.

Makes sense too me, but that is nothing where I could help, my Go skills aren't my strong suit ;).

andrewkroh commented 4 years ago

Well, hmm, I would not go as far as saying it is an algorithm...

Oh, ok, I was think it would be something generic like grab the first sentence with some special handling for edge cases.

Makes sense too me, but that is nothing where I could help, my Go skills aren't my strong suit ;).

If having that an option in the event log reader to produce a static parameterized message would work, I can look into adding that. It will be interesting to see what impact that has on the overall storage cost.

randomuserid commented 4 years ago

Analysts and rule authors do use the message field, it is useful when sifting events. The descriptions in Windows event logs are verbose but I would not characterize them as useless; they are necessary to an analyst who does not have this kind of detail committed to memory.

I think maybe we could tokenize them or substitute a pointer to a text blob in order to reduce the storage costs if this did not impact the analyst ux but truncating raw events would break many forensics workflows and requirements.

Many other event types have massive levels of repeat strings or data blobs and storage - disk space at least - is not always the largest cost line item in a SOC. Do we have user demand for truncation?

philippkahr commented 4 years ago

Analysts and rule authors do use the message field, it is useful when sifting events. The descriptions in Windows event logs are verbose but I would not characterize them as useless; they are necessary to an analyst who does not have this kind of detail committed to memory.

I would not have gone as far and made the truncate_message: true a default option. I always thought of it being an opt-in.

Many other event types have massive levels of repeat strings or data blobs and storage - disk space at least - is not always the largest cost line item in a SOC. Do we have user demand for truncation?

For me, the raw data without compressing and everything are for one event_id alone around 15GB per day. In my use cases, I often have to preserve the logs for an entire year, which results alone in around ~5,5tb of data.

I guess, I will do some sort of benchmarking.

As it is now:

  1. I will create an *.evtx file
  2. send that into elasticsearch
  3. look at the index size

Without message field:

  1. Create an ingest_pipeline where I drop the entire message field
  2. send the same *.evtx file with winlogbeat, using the ingest_pipeline
  3. look at the index size again.

And I will have to do both, two times, onetime with best_compression enabled and onetime without it. I will let you know as soon as I am finished.

randomuserid commented 4 years ago

well, yes, in a dev / test environment, I think we are free to truncate, deduplicate or do what we like with the data. We could document the risk that forensics workflows could be affected by truncation in production clusters. Maybe also some kind of alert in case this was enabled by accident.

  • What about removing the message block entirely and using Kibana scripted fields to render that message block on the fly?

That would seem like a good solution though I am wondering if security workflows would consider this forensically sound. If this were not widely used outside of dev environments, because of objections by forensics workflows, maybe it would be simpler to truncate in dev than to do this, depending on how much work this feature would be?

philippkahr commented 4 years ago

@randomuserid I think we have two opposing views on that topic. I myself as a Sysadmin / DevOps guy who is more worried about troubleshooting and just having all the data I need to debug the issue, as opposed to you who is into deep forensic analysis where data has to be unmanipulated to be verifiable.

I am not too deep into forensics, so any comment is really really welcome!

I did some sort of benchmarking yesterday.

This is the docker-compose I used to start a single node elasticsearch on my MacBook. Just for the record:

MacBook Pro (15-inch, 2017)
Processor 2,9 GHz Intel Core i7
Memory 16 GB 2133 MHz LPDDR3
Disk 512GB

I created an evtx file from one of our domain controllers, which resulted in exactly 1.159.357 events and a size of 1.024.528.384 bytes (1,024528384 GB) split into the following ratio (i did not list everything lower than 1000):

EventID Count Could be improved
5145 699.712
 4624 164.192 Contains a 2KB raw messages block that can be stripped
4634 164.149 Contains a single line that can be stripped
4776 55.599
4769 38.819 Contains a 1KB raw messages block that can be stripped
5136 14.697
5140 14.697
4688 5.069 Contains a 2KB raw messages block that can be stripped
4689 5.068
4648 1.008 Contains a single line that can be stripped

The kb is only in regards to the lines that are appended at the message block and are described above. Here is a screenshot over the distribution from the winlogbeat dashboard:

Screenshot 2020-02-28 at 22 25 05

Here is my winlogbeat 7.6.0 config.

When winlogbeat finished I waited for a few minutes and checked in Kibana at the monitoring tab, if any more events were published to the winlogbeat- index. I then did a short `curl -X GET "http://localhost:9200/winlogbeat-/_stats?pretty"` and stored them in .json files.

After I collected the index stats, I did a docker-compose up -d --force-recreate and I deleted the data folder from the winlogbeat folder.

Best_compression toggled off and message field exists Best_compression toggled on and message field exists Best_compression toggled on and message field is dropped Best_compression toggled off and message field is dropped

My take away is, that best_compression is really worth the CPU%, as it was able to reduce the storage costs by nearly ~46% (537252835 (~537mb) vs 985241036 (~985mb )).

Dropping the message field, nearly cuts everything in half (~45%). With the LZ4 compression, it is 985241036 (~985mb ) vs 542212467 (~542mb), and with best_compression 537252835 (~537mb) vs 236969349 (~236mb).

Description Size in bytes size in mb
Message_field: true, LZ4 compression 985241036 ~985mb
Message_field: true, best_compression 537252835 ~537mb
Message_field: false, LZ4 compression 542212467 ~542mb
Message_field: false, best_compression 236969349  ~236mb

I am currently figuring out, how I am going to drop the strings that I deem unnecessary, as soon as I have figured that out, I will do another benchmark and update this issue.

herrBez commented 4 years ago

Hi There,

A few times ago I had the same problem.

APPROACH 1


I took the approach of keeping only the first sentence of the message field, so that I can understand what is going on, while reducing the stored space.

For this extent I wrote this simple "script" processor that keeps the message until the first full stop.

processors:
# Processor to keep only the first sentence 
# Match Func Reference https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match
- script:
     lang: javascript
     id: my_filter
     source: >
       function process(event) {
         var message = event.Get("message")
         if (message != null) {
             var found = message.match(/^(\w+\s+)*\w+\./)
             if (found != null) {
                  event.Put("message", found[0]);
             }
         }
       }

e.g., given an event of type 4624, it stores simply

An account was successfully logged on.

APPROACH 2

I tried also the approach of keeping the whole message while removing only the description boilerplate, ⚠️ but I am not fully convinced that it works all the time:

# Processor to get rid of part of the description in windows event messages
# Match Func Reference https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match
- script:
     lang: javascript
     id: my_filter
     source: >
       function process(event) {
         var message = event.Get("message")
         if (message != null) {
             var found = message.match(/^(\w+\s+)*\w+\.\s+(((\S+\s+)*\S+):(\s+(\S+\s+)*\S+:\s+(\S+)))+/);
             if (found != null) {
                  event.Put("message", found[0]);
             }
         }
       }

Additional Considerations

Please consider, that although rare, it is possible that windows events are rendered by system configured with a language different from English. These systems may use other templates to generate the rendered message that do not respect the ones in the English system.

EDIT:

The first approach can be further simplified (and accelerated?) with the following processor:

  - dissect:
       tokenizer: "%{message}."
       field: "message"
       target_prefix: ""
botelastic[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.