aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
542 stars 141 forks source link

[BUG]: ParquetSerializer doesn't support different JsonPropertyName and ClrPropertyName on struct fields #410

Closed Mrinal-Thomas-Epic closed 7 months ago

Mrinal-Thomas-Epic commented 9 months ago

Library Version

4.16.4

OS

Windows

OS Architecture

32 bit

How to reproduce?

FieldStriperCompiler uses field.ClrPropName ?? field.Name when determining the property name to use when constructing the exepression.

When MakeField in the GetParquetSchema type extension constructs a StructField, it just passes in the column name (so the CLR property name is lost).

Suggested fix: Make StructField constructor take in an optional parameter ClrPropertyName (and set the ClrPropName property to that parameter). In GetParquetSchema, pass in the appropriate ClrPropertyName.

Failing test

using Parquet.Serialization;
using System.IO;
using System.Text.Json.Serialization;
using System.Threading.Tasks;

    public class Building
    {
        [JsonPropertyName("first_floor")]
        public Floor FirstFloor { get; set; }

        public Building(Floor firstFloor)
        {
            FirstFloor = firstFloor;
        }
    }

    public class Floor
    {
        [JsonPropertyName("num_room")]
        public int NumRooms { get; set; }

        [JsonPropertyName("room_names")]
        public string[] RoomNames { get; set; }

        public Floor(int numRooms, params string[] roomNames)
        {
            NumRooms = numRooms;
            RoomNames = roomNames;
        }
    }

    public class Program
    {
        public static async Task Main(string[] args)
        {
            Building[] buildings = new Building[]
            {
                new Building(new Floor(2, "kitchen", "bedroom")),
                new Building(new Floor(3, "kitchen", "bedroom", "bathroom")),
                new Building(new Floor(3, "office", "bedroom", "bathroom")),
            };

            Stream ms = new MemoryStream();
            await ParquetSerializer.SerializeAsync<Building>(buildings, ms);
        }
    }
Mrinal-Thomas-Epic commented 9 months ago

It also seems like ListField does not set ClrPropertyName, which would cause the same issue. I didn't look into the fix much, but it might be more complicated for that field.

aloneguid commented 7 months ago

PR merged