Closed lboening closed 1 month ago
PS D:\code\Powershell> $dt =get-date -Year 2024 -Month 8 -Day 4 -Hour 12 -Minute 12 -Second 12 -Millisecond 999
PS D:\code\Powershell> import-module -name psparquet
PS D:\code\Powershell> Export-Parquet -FilePath c:\alldata\dt.parquet -InputObject $dt
Operation failed: Object reference not set to an instance of an object.
WARNING: InputObjects contains unsupported values. Transform the data prior to running Export-Parquet.
I get this too: WARNING: InputObjects contains unsupported values. Transform the data prior to running Export-Parquet.
Hi Luke,
First off, thank you for trying out my module and thank you for reaching out.
The reason TimeCreatedUtc
from Get-WinEvent
is a string in your initial question is the export to csv you do. When you export typed data to csv most of the types get stripped off.
This example shows, how types are converted when exporting data to csv:
$data = 1..100 | foreach {
[pscustomobject]@{
Date = (Get-Date).AddHours($_)
Int32 = $_
TypedInt = [int]$_
IntWithNull = (($_ % 3 -eq 0) ? $null : $_)
Text = "Iteration $_"
}
}
$data | gm
TypeName: System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
Date NoteProperty datetime Date=21/09/2024 09.22.02
Int32 NoteProperty System.Int32 Int32=1
IntWithNull NoteProperty System.Int32 IntWithNull=1
Text NoteProperty string Text=Iteration 1
TypedInt NoteProperty int TypedInt=1
$data | Export-CSV C:\Temp\data.csv
PS> Import-CSV C:\Temp\data.csv | Get-Member
TypeName: System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
Date NoteProperty string Date=21/09/2024 09.22.02
Int32 NoteProperty string Int32=1
IntWithNull NoteProperty string IntWithNull=1
Text NoteProperty string Text=Iteration 1
TypedInt NoteProperty string TypedInt=1
The same thing happens with the WinEvent object
$WinEvents = Get-WinEvent -LogName 'Windows PowerShell' -MaxEvents 5
PS> $WinEvents | Get-Member
TypeName: System.Diagnostics.Eventing.Reader.EventLogRecord
Name MemberType Definition
---- ---------- ----------
Dispose Method void Dispose(), void IDisposable.Dispose()
Equals Method bool Equals(System.Object obj)
FormatDescription Method string FormatDescription(), string FormatDescription(System.Collections.Generic.IEnu…
GetHashCode Method int GetHashCode()
GetPropertyValues Method System.Collections.Generic.IList[System.Object] GetPropertyValues(System.Diagnostics…
GetType Method type GetType()
ToString Method string ToString()
ToXml Method string ToXml()
Message NoteProperty string Message=Engine state is changed from None to Available. …
ActivityId Property System.Nullable[guid] ActivityId {get;}
Bookmark Property System.Diagnostics.Eventing.Reader.EventBookmark Bookmark {get;}
ContainerLog Property string ContainerLog {get;}
Id Property int Id {get;}
Keywords Property System.Nullable[long] Keywords {get;}
KeywordsDisplayNames Property System.Collections.Generic.IEnumerable[string] KeywordsDisplayNames {get;}
Level Property System.Nullable[byte] Level {get;}
LevelDisplayName Property string LevelDisplayName {get;}
LogName Property string LogName {get;}
MachineName Property string MachineName {get;}
MatchedQueryIds Property System.Collections.Generic.IEnumerable[int] MatchedQueryIds {get;}
Opcode Property System.Nullable[short] Opcode {get;}
OpcodeDisplayName Property string OpcodeDisplayName {get;}
ProcessId Property System.Nullable[int] ProcessId {get;}
Properties Property System.Collections.Generic.IList[System.Diagnostics.Eventing.Reader.EventProperty] P…
ProviderId Property System.Nullable[guid] ProviderId {get;}
ProviderName Property string ProviderName {get;}
Qualifiers Property System.Nullable[int] Qualifiers {get;}
RecordId Property System.Nullable[long] RecordId {get;}
RelatedActivityId Property System.Nullable[guid] RelatedActivityId {get;}
Task Property System.Nullable[int] Task {get;}
TaskDisplayName Property string TaskDisplayName {get;}
ThreadId Property System.Nullable[int] ThreadId {get;}
TimeCreated Property System.Nullable[datetime] TimeCreated {get;}
UserId Property System.Security.Principal.SecurityIdentifier UserId {get;}
Version Property System.Nullable[byte] Version {get;}
The System.Diagnostics.Eventing.Reader.EventLogRecord
contains all sorts of funky types, that will not survive the conversion to System.String
:
PS> $WinEvents | ConvertTo-Csv | Get-Member
TypeName: System.String
Name MemberType Definition
---- ---------- ----------
Clone Method System.Object Clone(), System.Object ICloneable.Clone()
CompareTo Method int CompareTo(System.Object value), int CompareTo(string strB), int ICompar…
Contains Method bool Contains(string value), bool Contains(string value, System.StringCompa…
CopyTo Method void CopyTo(int sourceIndex, char[] destination, int destinationIndex, int …
EndsWith Method bool EndsWith(string value), bool EndsWith(string value, System.StringCompa…
EnumerateRunes Method System.Text.StringRuneEnumerator EnumerateRunes()
Equals Method bool Equals(System.Object obj), bool Equals(string value), bool Equals(stri…
GetEnumerator Method System.CharEnumerator GetEnumerator(), System.Collections.IEnumerator IEnum…
GetHashCode Method int GetHashCode(), int GetHashCode(System.StringComparison comparisonType)
GetPinnableReference Method System.Char&, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, Pub…
GetType Method type GetType()
GetTypeCode Method System.TypeCode GetTypeCode(), System.TypeCode IConvertible.GetTypeCode()
IndexOf Method int IndexOf(char value), int IndexOf(char value, int startIndex), int Index…
IndexOfAny Method int IndexOfAny(char[] anyOf), int IndexOfAny(char[] anyOf, int startIndex),…
Insert Method string Insert(int startIndex, string value)
IsNormalized Method bool IsNormalized(), bool IsNormalized(System.Text.NormalizationForm normal…
LastIndexOf Method int LastIndexOf(char value), int LastIndexOf(char value, int startIndex), i…
LastIndexOfAny Method int LastIndexOfAny(char[] anyOf), int LastIndexOfAny(char[] anyOf, int star…
Normalize Method string Normalize(), string Normalize(System.Text.NormalizationForm normaliz…
PadLeft Method string PadLeft(int totalWidth), string PadLeft(int totalWidth, char padding…
PadRight Method string PadRight(int totalWidth), string PadRight(int totalWidth, char paddi…
Remove Method string Remove(int startIndex, int count), string Remove(int startIndex)
Replace Method string Replace(string oldValue, string newValue, bool ignoreCase, culturein…
ReplaceLineEndings Method string ReplaceLineEndings(), string ReplaceLineEndings(string replacementTe…
Split Method string[] Split(char separator, System.StringSplitOptions options = System.S…
StartsWith Method bool StartsWith(string value), bool StartsWith(string value, System.StringC…
Substring Method string Substring(int startIndex), string Substring(int startIndex, int leng…
ToBoolean Method bool IConvertible.ToBoolean(System.IFormatProvider provider)
ToByte Method byte IConvertible.ToByte(System.IFormatProvider provider)
ToChar Method char IConvertible.ToChar(System.IFormatProvider provider)
ToCharArray Method char[] ToCharArray(), char[] ToCharArray(int startIndex, int length)
ToDateTime Method datetime IConvertible.ToDateTime(System.IFormatProvider provider)
ToDecimal Method decimal IConvertible.ToDecimal(System.IFormatProvider provider)
ToDouble Method double IConvertible.ToDouble(System.IFormatProvider provider)
ToInt16 Method short IConvertible.ToInt16(System.IFormatProvider provider)
ToInt32 Method int IConvertible.ToInt32(System.IFormatProvider provider)
ToInt64 Method long IConvertible.ToInt64(System.IFormatProvider provider)
ToLower Method string ToLower(), string ToLower(cultureinfo culture)
ToLowerInvariant Method string ToLowerInvariant()
ToSByte Method sbyte IConvertible.ToSByte(System.IFormatProvider provider)
ToSingle Method float IConvertible.ToSingle(System.IFormatProvider provider)
ToString Method string ToString(), string ToString(System.IFormatProvider provider), string…
ToType Method System.Object IConvertible.ToType(type conversionType, System.IFormatProvid…
ToUInt16 Method ushort IConvertible.ToUInt16(System.IFormatProvider provider)
ToUInt32 Method uint IConvertible.ToUInt32(System.IFormatProvider provider)
ToUInt64 Method ulong IConvertible.ToUInt64(System.IFormatProvider provider)
ToUpper Method string ToUpper(), string ToUpper(cultureinfo culture)
ToUpperInvariant Method string ToUpperInvariant()
Trim Method string Trim(), string Trim(char trimChar), string Trim(Params char[] trimCh…
TrimEnd Method string TrimEnd(), string TrimEnd(char trimChar), string TrimEnd(Params char…
TrimStart Method string TrimStart(), string TrimStart(char trimChar), string TrimStart(Param…
TryCopyTo Method bool TryCopyTo(System.Span[char] destination)
Chars ParameterizedProperty char Chars(int index) {get;}
Length Property int Length {get;}
When the CSV data is converted back to PSObjects (Import-Csv), PowerShell tries to match the nearest probable data type for each field, as can be seen here:
PS> ConvertFrom-Csv $CSVObjects | Get-Member
TypeName: System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
ActivityId NoteProperty string ActivityId=
Bookmark NoteProperty string Bookmark=System.Diagnostics.Eventing.Reader.EventBookmark
ContainerLog NoteProperty string ContainerLog=Windows PowerShell
Id NoteProperty string Id=400
Keywords NoteProperty string Keywords=36028797018963968
KeywordsDisplayNames NoteProperty string KeywordsDisplayNames=System.Collections.ObjectModel.ReadOnlyCollection`1[Syst…
Level NoteProperty string Level=4
LevelDisplayName NoteProperty string LevelDisplayName=Information
LogName NoteProperty string LogName=Windows PowerShell
MachineName NoteProperty string MachineName=Ax-Laptop6
MatchedQueryIds NoteProperty string MatchedQueryIds=System.UInt32[]
Message NoteProperty string Message=Engine state is changed from None to Available. …
Opcode NoteProperty string Opcode=0
OpcodeDisplayName NoteProperty string OpcodeDisplayName=Info
ProcessId NoteProperty string ProcessId=20324
Properties NoteProperty string Properties=System.Collections.Generic.List`1[System.Diagnostics.Eventing.Read…
ProviderId NoteProperty string ProviderId=
ProviderName NoteProperty string ProviderName=PowerShell
Qualifiers NoteProperty string Qualifiers=0
RecordId NoteProperty string RecordId=3437
RelatedActivityId NoteProperty string RelatedActivityId=
Task NoteProperty string Task=4
TaskDisplayName NoteProperty string TaskDisplayName=Engine Lifecycle
ThreadId NoteProperty string ThreadId=0
TimeCreated NoteProperty string TimeCreated=21/09/2024 07.38.33
UserId NoteProperty string UserId=
Version NoteProperty string Version=0
It is all these strings that gets exported to Parquet in your initail scenario.
The solution is to curate the data in another way before exporting them to parquet. Export-Parquet also has to do some type conversion, before data can be stored. I am using best effort on the most common data types, so if you stick to simple types, you should be good.
In the help documentation for Export-Parquet
you find this example:
$File = "C:\Temp\Test.parquet"
$data = 1..100 | foreach {
[pscustomobject]@{
Date = (Get-Date).AddHours($_)
Int32 = $_
TypedInt = [int]$_
IntWithNull = (($_ % 3 -eq 0) ? $null : $_)
Text = "Iteration $_"
}
}
$data | Export-Parquet -FilePath $File -Force
When you import data from this example, the types will be the same as the exported types.
Getting your System.Diagnostics.Eventing.Reader.EventLogRecord
converted to a PSCustomObject that has the correct types for proper storage is a Data Engineering task. You have chosen to store the System.Diagnostics.Eventing.Reader.EventLogRecord
in parquet and now you need to turn that object into something that Parquet can handle.
First, you need to evaluate where in the data flow you want to intervene. Exporting the events to csv degredates the data tremendously, so you might want to choose a different format for step number 1. Export-CliXml
is capable of storing much more exact information about the System.Diagnostics.Eventing.Reader.EventLogRecord
, but it also takes a lot more diskspace.
If you are allowed to install PowerShell (not Windows PowerShell) on the machine that runs the Get-WinEvent
command, you can create the parquet files directly with very little effort.
First you need to get rid of the most irrelevant information from the System.Diagnostics.Eventing.Reader.EventLogRecord
object. That can be done by using Select-Object
like this
# Get all properties
$SelectorString = $WinEvents | Get-Member -MemberType Property, NoteProperty | Select-Object -ExpandProperty Name | Join-String -Separator ','
# The selectorstring variable contains all the properties of the System.Diagnostics.Eventing.Reader.EventLogRecord object
PS> $SelectorString
Message,ActivityId,Bookmark,ContainerLog,Id,Keywords,KeywordsDisplayNames,Level,LevelDisplayName,LogName,MachineName,MatchedQueryIds,Opcode,OpcodeDisplayName,ProcessId,Properties,ProviderId,ProviderName,Qualifiers,RecordId,RelatedActivityId,Task,TaskDisplayName,ThreadId,TimeCreated,UserId,Version
If you do not need them all for your further work, you can just leave them out. The properties you are left with, are the properties you need to manitplate to fit the parquet format.
A short exampele shows the work, that needs to be done:
$SelectedParameters = $WinEvents | Select-Object Id,TimeCreated,Bookmark,Message
Now there is 4 data types to work with:
PS> $SelectedParameters | Get-Member
TypeName: Selected.System.Diagnostics.Eventing.Reader.EventLogRecord
Name MemberType Definition
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
Bookmark NoteProperty EventBookmark Bookmark=System.Diagnostics.Eventing.Reader.EventBookmark
Id NoteProperty int Id=400
Message NoteProperty string Message=Engine state is changed from None to Available. …
TimeCreated NoteProperty datetime TimeCreated=21/09/2024 07.38.33
We can try to export this to parquet straight away, but that will give us an error:
$SelectedParameters | Export-Parquet C:\Temp\SelectedParameters.parquet
Operation failed: type System.Object is not supported
WARNING: InputObjects contains unsupported values. Transform the data prior to running Export-Parquet.
And there is the task we need to attend to. Bookmark is an object and needs to be converted to something else before storing it in parquet.
The Data Engineer would first evaluate the contents of the Bookmark
property. In this case, it looks like this:
PS> $SelectedParameters.Bookmark.BookmarkXml
<BookmarkList Direction='backward'>
<Bookmark Channel='Windows PowerShell' RecordId='3437' IsCurrent='true'/>
</BookmarkList>
After having looked at a few more of the Bookmark
parameter contents, the Data Engineer would decide, whether to extract some of the data in the object, keep the object type as a string or leave it out entirely.
Let's leave it out here, since the objective is to get the datetime data in parquet
$SelectedParameters = $WinEvents | Select-Object Id,TimeCreated,Message
$SelectedParameters | Export-Parquet C:\Temp\EventData.parquet
PS> Import-Parquet C:\Temp\EventData.parquet | Get-Member
TypeName: System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
Id NoteProperty int Id=400
Message NoteProperty string Message=Engine state is changed from None to Available. …
TimeCreated NoteProperty datetime TimeCreated=21/09/2024 05.38.33
Now the data can be exported to parquet and re-importing the data gives us the right type
Data Engineering is a valid and sometimes well paid occupation. Remember to tell your employer ;-)
@lboening thanks to your question, I found a bug in the module. That has been fixed now. If you update the module, you can do something like this:
$SelectedParameters = $WinEvents | Select-Object Id,TimeCreated,Message,@{n='Bookmark';e={$_.Bookmark | Convertto-json -Compress}}
$SelectedParameters | Export-Parquet C:\Temp\EventData.parquet
It is still required to transform complexe objects to simple objects, but with this fix you can convert almost any object to a json string. This hopefully makes your Data Engineer work easier.
Are you supposed to bump this:
ModuleVersion = '0.2.14'
I have been using Microsoft.PowerShell.PsResourceGet
PS C:\> find-psresource -Repository PsGallery -Type Module -name PsParquet
Name Version Prerelease Repository Description
---- ------- ---------- ---------- -----------
PSParquet 0.2.14 PSGallery Import and export objects to parquet format
Yes, that is the new version: https://www.powershellgallery.com/packages/PSParquet/0.2.14
@lboening this runs directly to parquet:
Get-WinEvent -LogName 'Windows PowerShell' | Select -ExcludeProperty Bookmark, UserId, ActivityId,Properties,RelatedActivityId, ProviderId, KeywordsDisplayNames,MatchedQueryIds *,@{n="Bookmark"; e={$_.Bookmark.ToString()}},@{n="UserId"; e={$_.UserId | ConvertTo-Json}} ,@{n="ActivityId"; e={$_.ActivityId | ConvertTo-Json}},@{n="Properties"; e={$_.Properties | ConvertTo-Json}},@{n="RelatedActivityId"; e={$_.RelatedActivityId | ConvertTo-Json}},@{n="ProviderId"; e={$_.ProviderId | ConvertTo-Json}},@{n="KeywordsDisplayNames"; e={$_.KeywordsDisplayNames.ToString()}},@{n="MatchedQueryIds"; e={$_.MatchedQueryIds.ToString()}} | Export-Parquet c:\temp\test.parquet -Force
PS C:\> Get-WinEvent -LogName 'Windows PowerShell' | Select -ExcludeProperty Bookmark, UserId, ActivityId,Properties,RelatedActivityId, ProviderId, KeywordsDisplayNames,MatchedQueryIds *,@{n="Bookmark"; e={$_.Bookmark.ToString()}},@{n="UserId"; e={$_.UserId | ConvertTo-Json}} ,@{n="ActivityId"; e={$_.ActivityId | ConvertTo-Json}},@{n="Properties"; e={$_.Properties | ConvertTo-Json}},@{n="RelatedActivityId"; e={$_.RelatedActivityId | ConvertTo-Json}},@{n="ProviderId"; e={$_.ProviderId | ConvertTo-Json}},@{n="KeywordsDisplayNames"; e={$_.KeywordsDisplayNames.ToString()}},@{n="MatchedQueryIds"; e={$_.MatchedQueryIds.ToString()}} | Export-Parquet c:\temp\test.parquet -Force
Operation failed: The input string 'System.Diagnostics.Eventing.Reader.EventBookmark' was not in a correct format.
WARNING: InputObjects contains unsupported values. Transform the data prior to running Export-Parquet.
PS C:\>
It ran better when I ran PowerShell as admin.
An error for the BookMark. Forgive but I have seen the "bookmark" and I do not appreciate it. Is this a valid link to an explanation: https://learn.microsoft.com/en-us/windows/win32/wes/bookmarking-events
Bookmark is in XML format. Turning it to a string with ToString() should leave you with the type name. In effect, you will only save the type name to parquet (like when you evport to csv). If you do not need the Bookmark data, you can leave it out entirely. If you do need it, you should write code to extract the relevant information from it.
PS C:\> Get-WinEvent -LogName 'Windows PowerShell' | Select -ExcludeProperty Bookmark, UserId, ActivityId,Properties,RelatedActivityId, ProviderId, KeywordsDisplayNames,MatchedQueryIds *,@{n="Bookmark"; e={$_.Bookmark.ToString()}},@{n="UserId"; e={$_.UserId | ConvertTo-Json}} ,@{n="ActivityId"; e={$_.ActivityId | ConvertTo-Json}},@{n="Properties"; e={$_.Properties | ConvertTo-Json}},@{n="RelatedActivityId"; e={$_.RelatedActivityId | ConvertTo-Json}},@{n="ProviderId"; e={$_.ProviderId | ConvertTo-Json}},@{n="KeywordsDisplayNames"; e={$_.KeywordsDisplayNames.ToString()}},@{n="MatchedQueryIds"; e={$_.MatchedQueryIds.ToString()}} | Export-Parquet c:\temp\test.parquet -Force Operation failed: The input string 'System.Diagnostics.Eventing.Reader.EventBookmark' was not in a correct format. WARNING: InputObjects contains unsupported values. Transform the data prior to running Export-Parquet. PS C:\>
It ran better when I ran PowerShell as admin.
I think the new runspace you got when runnig as admin did it.
Closing this. Thanks for trying the module.
Describe the bug It's not a bug, but a question how to mark a column as a particular type
To Reproduce
Expected behavior PSParquet marks the type of the column as a datetime object.
Screenshots None.
PowerShell info
Desktop (please complete the following information):
Additional context None