Closed nicolaslattuada closed 6 months ago
Hi I have a parquet file with the following schema:
ObjTwo { Id guid Url string Title string Properties Map "Key values map of annotations" } ObjOne { Id guid Name string Score double Metadata Map "Key values map of annotations" } Job { Name string Owner string OtherId guid Engine string "Optional Default=undefined" EndTime datetime ObjTwos List "List of <ObjTwo>'s" ObjOnes List "List of <ObjOne>'s" }
Let's pretend ObjTwos and ObjOnes have each 20M records, and I cannot fit those lists in memory in my running environment.
I want to split it in 3 parquet files:
I am interested in having best performance and low memory footprint.
Is it possible to do this using parquet-dotnet, without having to load the data in memory? Thanks :)
I think you're looking for "row groups"?
Issue description
Hi I have a parquet file with the following schema:
Let's pretend ObjTwos and ObjOnes have each 20M records, and I cannot fit those lists in memory in my running environment.
I want to split it in 3 parquet files:
I am interested in having best performance and low memory footprint.
Is it possible to do this using parquet-dotnet, without having to load the data in memory? Thanks :)