aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
600 stars 151 forks source link

[BUG]: SerializeAsync throws when serializing nullable decimal #465

Closed stefer closed 8 months ago

stefer commented 8 months ago

Library Version

4.23.0

OS

Windows/Linux

OS Architecture

64 bit

How to reproduce?

The following code throws exception when version 4.23.0 or larger is used.

Exception:

Expression of type 'System.Nullable`1[System.Decimal]' cannot be used for parameter of type 'System.Decimal' of method 'Void Add(System.Decimal)' (Parameter 'arg0')

Stacktrace:

   at System.Dynamic.Utils.ExpressionUtils.ValidateOneArgument(MethodBase method, ExpressionType nodeKind, Expression arguments, ParameterInfo pi, String methodParamName, String argumentParamName, Int32 index)
   at System.Linq.Expressions.Expression.Call(Expression instance, MethodInfo method, Expression arg0)
   at System.Linq.Expressions.Expression.Call(Expression instance, MethodInfo method, IEnumerable`1 arguments)
   at Parquet.Serialization.Dremel.FieldStriperCompiler`1.WriteValue(ParameterExpression valueVar, Int32 dl, Expression currentRlVar, ParameterExpression isLeaf, Boolean isAtomic)
   at Parquet.Serialization.Dremel.FieldStriperCompiler`1.WhileBody(Expression element, Boolean isAtomic, Int32 dl, ParameterExpression currentRlVar, ParameterExpression seenFieldsVar, Field field, Int32 rlDepth, Type elementType, List`1 path)
   at Parquet.Serialization.Dremel.FieldStriperCompiler`1.DissectRecord(Expression rootVar, Field parentField, Field[] levelFields, List`1 path, Type rootType, Int32 rlDepth, ParameterExpression currentRlVar)
   at Parquet.Serialization.Dremel.FieldStriperCompiler`1.Compile()
   at Parquet.Serialization.Dremel.Striper`1.CreateStriper(DataField df)
   at System.Linq.Enumerable.SelectArrayIterator`2.Fill(ReadOnlySpan`1 source, Span`1 destination, Func`2 func)
   at System.Linq.Enumerable.SelectArrayIterator`2.ToList()
   at Parquet.Serialization.Dremel.Striper`1..ctor(ParquetSchema schema)
   at Parquet.Serialization.ParquetSerializer.<>c__6`1.<SerializeAsync>b__6_0(Type _)
   at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
   at Parquet.Serialization.ParquetSerializer.SerializeAsync[T](IEnumerable`1 objectInstances, Stream destination, ParquetSerializerOptions options, CancellationToken cancellationToken)
   at Program.<Main>$(String[] args) in :line 9
   at Program.<Main>(String[] args)

Failing test

#r "nuget: Parquet.Net,4.23.0"

using Parquet.Serialization;

var stream = new MemoryStream();

var values = new [] {new Test {Value = 23}};

var result = await ParquetSerializer.SerializeAsync(values, stream).ConfigureAwait(false);

class Test {
   public decimal? Value {get; set;}
}
stefer commented 8 months ago

In 4.22.0, the Datafield.IsNullable was set to True for decimal? properties and those fields where exported as null in the files if the value was null.

There seem to be missing test cases for nullable decimals.

#r "nuget: NUnit, 4.0.1"
#r "nuget: Parquet.Net,4.22.0"

using Parquet.Serialization;
using Parquet.Serialization.Attributes;
using NUnit.Framework;

var stream = new MemoryStream();

var values = new [] {new Test {Value = 23, Value2 = 43}};

var result = await ParquetSerializer.SerializeAsync(values, stream).ConfigureAwait(false);

Assert.That(result.DataFields[0].IsNullable, Is.True);
Assert.That(result.DataFields[1].IsNullable, Is.False);

class Test {
   public decimal? Value {get; set;}
   public decimal Value2 {get; set;}
}
aloneguid commented 8 months ago

Thanks for raising this, seems to be regression after adding support for class fields. Just fixing and adding test coverage.

aloneguid commented 8 months ago

Fixed in the latest hotfix, feel free to try ;)

DavidMoyaInnoCV commented 8 months ago

@aloneguid The same thing happens with the DateTime type in version 4.23.3. I think you should check all types :D

aloneguid commented 8 months ago

@aloneguid The same thing happens with the DateTime type in version 4.23.3. I think you should check all types :D

Thanks, that seems to be happening to all types that can have custom attributes. I have no idea how it worked before ;)

VadymKuzin commented 8 months ago

Possibly related to my issue: https://github.com/aloneguid/parquet-dotnet/issues/467

stefer commented 8 months ago

Hi, I can confirm that nullable decimal now works, but I also have problems with DateTime.

stefer commented 8 months ago

I added a preview PR here : https://github.com/aloneguid/parquet-dotnet/pull/468 Maybe it could be an inspiration to a solution.

aloneguid commented 8 months ago

other types are fixed in the latest release too :)

DavidMoyaInnoCV commented 8 months ago

Thanks @aloneguid! I will try it soon :)

stefer commented 8 months ago

Thanks @aloneguid - it now works as expected.

cliedeman commented 6 months ago

Just encountered this issue with Timespan

Expression of type 'System.Nullable`1[System.TimeSpan]' cannot be used for parameter of type 'System.TimeSpan' of method 'Void Add(System.TimeSpan)' (Parameter 'arg0')
   at System.Dynamic.Utils.ExpressionUtils.ValidateOneArgument(MethodBase method, ExpressionType nodeKind, Expression arguments, ParameterInfo pi, String methodParamName, String argumentParamName, Int32 index)
   at System.Linq.Expressions.Expression.Call(Expression instance, MethodInfo method, Expression arg0)
   at System.Linq.Expressions.Expression.Call(Expression instance, MethodInfo method, IEnumerable`1 arguments)