aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
542 stars 140 forks source link

[BUG]: Enum Serialization/Deserialization update #488

Closed seoktaekim closed 1 month ago

seoktaekim commented 3 months ago

Library Version

4.23.4

OS

OS

OS Architecture

ARM 64

How to reproduce?

  1. Use class serialization with enum property
  2. Produces no error but enum is not updated when parquet file is deserialized

genius feature btw

Failing test

No response

Pragmateek commented 1 month ago

Hello,

same issue with 4.24.0-pre.2:

using Parquet.Serialization;

var inputA = new A { E = E.One };

Console.WriteLine($"Input: {inputA.E}");

using var ms = new MemoryStream();
await ParquetSerializer.SerializeAsync([inputA], ms);

ms.Seek(0, SeekOrigin.Begin);

var outputA = (await ParquetSerializer.DeserializeAsync<A>(ms)).Single();

Console.WriteLine($"Output: {outputA.E}");

enum E
{
    Zero,
    One,
}

class A
{
    public E E { get; set; }
}

Result:

Input: One
Output: Zero

Is there a workaround?

Thanks,

Mickael

aloneguid commented 1 month ago

Unfortunately enums are not yet supported. As a workaround you can mark enum as ignored property and expose a get/set property exposing one of the supported types. This is a popular ask though, so I'll have to implement this in the next release.

Pragmateek commented 1 month ago

OK too bad. As of now I've used DTOs which map enums to strings. Indeed that would be great as this is often used. Maybe you could let the choice between mapping the integer or string value of each enum entry.

aloneguid commented 1 month ago

I have initial support for this already, which serialises enums to their underlying type (int or else).

aloneguid commented 1 month ago

You can try the -pre.5 version for support of this.

Pragmateek commented 1 month ago

I have initial support for this already, which serialises enums to their underlying type (int or else).

Indeed with -pre.4 the integer value was serialized but under field value__.

Pragmateek commented 1 month ago

You can try the -pre.5 version for support of this.

The roundtrip is working with -pre.5. Thanks again. :)

Pragmateek commented 3 weeks ago

Trying to plug the Parquet into AWS Athena and diving into the schema I see that the enum's integer values are serialized. Is it be possible to serialize the names? Maybe by annotating the field? Thanks

Pragmateek commented 2 weeks ago

In the meantime, is there a way to hook into the serialization process to customize the way each field is serialized? Thanks