aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
542 stars 141 forks source link

Help deciphering exception #528

Closed dingalla1 closed 3 days ago

dingalla1 commented 1 week ago

Issue description

Is there any additional detail that can be provided for this exception?  

System.ArgumentNullException: Value cannot be null. (Parameter 'key')
   at System.ThrowHelper.ThrowArgumentNullException(System.ExceptionArgument argument)
   at System.Collections.Generic.Dictionary`2.TryInsert(System.Collections.Generic.TKey key, System.Collections.Generic.TValue value,
System.Collections.Generic.InsertionBehavior behavior)
   at System.Collections.Generic.Dictionary`2.set_Item(System.Collections.Generic.TKey key, System.Collections.Generic.TValue value)
   at Parquet.Encodings.ParquetDictionaryEncoder.TryExtractDictionary(System.Type elementType, System.Array data, System.Int32 offset,
System.Int32 count, System.Array& dictionaryArray, System.Int32[]& rentedIndexes, System.Double threshold)
   at Parquet.File.PackedColumn.Pack(System.Boolean useDictionaryEncoding, System.Double dictionaryThreshold)
   at Parquet.File.<WriteColumnAsync>d__12.MoveNext()
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task task,
aloneguid commented 1 week ago

It would help providing details of what exactly you are trying to do, otherwise you won't like the answer 😉

EamonHetherton commented 5 days ago

The error suggests that the "data" being passed to the ParquetDictionaryEncoder contains null values. Due to the fact that a HashSet allows you to add a null, but a Dictionary<string,int> does not accept null as a key when it calculates the value to index mapping in ParquetDictionaryEncoder the exception is generated but probably shouldn't even be getting to here with nulls in the first place. Will need to know the shape and contents of the objects you are trying to serialise to investigate this further.

dingalla1 commented 3 days ago

This was exactly it. The bug was in my code in that I was using passing an array with null values on a column with a non-null schema DataField.

I'll close