Cinchoo / ChoETL

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
MIT License
747 stars 134 forks source link

Nested array #285

Open turbomaicol opened 1 year ago

turbomaicol commented 1 year ago

Trying to convert JSON to Parquet

Sample Json: { "Stype":"BaseDecorator", "Decorators":[ {"Stype":"FiscalInformationDecorator","FiscalInformation":{"Stype":"FiscalInformation","UUID":"02d0c973-727e-449e-bb4e-45dddbd7dbeb", etc...}}, {"Stype":"DocumentInformationDecorator","DocumentInformation":{"Stype":"DocumentInformation","DocumentModelID":"7ec7b1d4-f94f-42b5-ba36-77701cdf1db4", etc...}}, {"Stype":"IssuingInformationDecorator","IssuingInformation":{"Stype":"IssuingInformation","RFC":"PRR890126QC2", etc...}} ], "InstanceID":"78091f6e-e458-4a23-abfe-fe286b24b59a", "company":"d6038f2d-787c-427b-8eaf-4d9eea44a24a" }

Decorators is an array

Using: var stringJson = JArray.FromObject(deserialized_jsons).ToString(); using (var r = ChoJSONReader.LoadText(stringJson).ErrorMode(ChoErrorMode.IgnoreAndContinue)) { using (var w = new ChoParquetWriter(stream, new ChoParquetRecordConfiguration { CompressionMethod = Parquet.CompressionMethod.Snappy}) .ThrowAndStopOnMissingField(false) .ErrorMode(ChoErrorMode.IgnoreAndContinue)) { w.Write(r); } }

Can I have it be represented as: stype string decorators array<struct<Stype:string,FiscalInformation:struct<Stype:string,UUID:string,CFDIUse:string, etc... InstanceID string company string

instead of type string decorators_0_stype string decorators_0_fiscalinformation_stype string decorators_0_fiscalinformation_uuid string, etc...

I don't want a column for each property of each nested array, all of them separated by numbers. I want one column that contains all the elements of the nested array.

Is there a way to have the column be an array for search purposes? (e.g. when using Amazon Athena to query the file as a parquet file)

If I generate the parquet file with AWS Glue, it gives me the column as array

Cinchoo commented 1 year ago

I'm afraid can do that. Can u spell out expected parquet file layout with possible values? I'll take a look and provide u input. Thx