JPCERTCC / LogonTracer

Investigate malicious Windows logon by visualizing and analyzing Windows event log
Other
2.7k stars 441 forks source link

Possible bug in XML parsing found (including fix) #107

Closed NexusFuzzy closed 3 years ago

NexusFuzzy commented 3 years ago

First of all thanks a lot for this awesome tool and all the effort you put into it. I recently had the case when I had to ingest a large EVTX file and as you suggested it is faster if you use an XML file instead. To do that, I used the following code to create the XML file:

$source = "C:\temp\security.xml"
$source_temp = "C:\temp\security_temp.xml"
$destination = "C:\temp\security.zip"
If(Test-path $destination) {Remove-item $destination}
If(Test-path $source) {Remove-item $source}
If(Test-path $source_temp) {Remove-item $source_temp}
Write-Host "Exporting logs"
wevtutil qe Security > $source_temp
Write-Host "Converting to ASCII for logontracer"
get-content $source_temp | out-file -encoding ASCII $source
Write-Host "Creating archive"
Compress-Archive -Path $source -DestinationPath $destination
Write-Host "Cleaning up"
Remove-item $source
Remove-item $source_temp

After manually adding the line

<?xml version="1.0" encoding="UTF-8"?>

logontracer finally accepted it as XML file and started trying to parse it but failed with the following error:

ValueError: time data '2021-03-1721:45:05' does not match format '%Y-%m-%dT%H:%M:%S'

As you might notice, logontracer correctly extracted the timestamp correctly but the regular expression in convert_logtime removed the "T" without replacing it with a whitespace:

tzless = re.sub('[^0-9-:\s]', '', logtime.split(".")[0]).strip()

which leads to 2021-03-17T21:45:05.893954400Z becoming 2021-03-1721:45:05 (see https://regex101.com/r/9on8Ec/1 for the example) and the timestamp cannot be parsed correctly by the next code in convert_logtime:

try:
        return datetime.datetime.strptime(tzless, "%Y-%m-%d %H:%M:%S") + datetime.timedelta(hours=tzone)
    except:
        return datetime.datetime.strptime(tzless, "%Y-%m-%dT%H:%M:%S") + datetime.timedelta(hours=tzone)

FIX:

def convert_logtime(logtime, tzone):
    tzless = re.sub('[^0-9-:\s]', ' ', logtime.split(".")[0]).strip()
    try:
        return datetime.datetime.strptime(tzless, "%Y-%m-%d %H:%M:%S") + datetime.timedelta(hours=tzone)
    except:
        return datetime.datetime.strptime(tzless, "%Y-%m-%dT%H:%M:%S") + datetime.timedelta(hours=tzone)

Matches in Regular expression are replaced with a whitespace which results in '2021-03-17 19:45:05' which can be parsed perfectly and logontracer is able to ingest events.

Hope that helps!

shu-tom commented 3 years ago

Thank you for using LogonTracer. I've fixed it based on your good issue.