Using:
var stringJson = JArray.FromObject(deserialized_jsons).ToString();
using (var r = ChoJSONReader.LoadText(stringJson).ErrorMode(ChoErrorMode.IgnoreAndContinue))
{
using (var w = new ChoParquetWriter(stream, new ChoParquetRecordConfiguration { CompressionMethod = Parquet.CompressionMethod.Snappy})
.ThrowAndStopOnMissingField(false)
.ErrorMode(ChoErrorMode.IgnoreAndContinue))
{
w.Write(r);
}
}
Can I have it be represented as:
stype string
decorators array<struct<Stype:string,FiscalInformation:struct<Stype:string,UUID:string,CFDIUse:string, etc...
InstanceID string
company string
instead of
type string
decorators_0_stype string
decorators_0_fiscalinformation_stype string
decorators_0_fiscalinformation_uuid string, etc...
I don't want a column for each property of each nested array, all of them separated by numbers. I want one column that contains all the elements of the nested array.
Is there a way to have the column be an array for search purposes? (e.g. when using Amazon Athena to query the file as a parquet file)
If I generate the parquet file with AWS Glue, it gives me the column as array
Trying to convert JSON to Parquet
Sample Json: { "Stype":"BaseDecorator", "Decorators":[ {"Stype":"FiscalInformationDecorator","FiscalInformation":{"Stype":"FiscalInformation","UUID":"02d0c973-727e-449e-bb4e-45dddbd7dbeb", etc...}}, {"Stype":"DocumentInformationDecorator","DocumentInformation":{"Stype":"DocumentInformation","DocumentModelID":"7ec7b1d4-f94f-42b5-ba36-77701cdf1db4", etc...}}, {"Stype":"IssuingInformationDecorator","IssuingInformation":{"Stype":"IssuingInformation","RFC":"PRR890126QC2", etc...}} ], "InstanceID":"78091f6e-e458-4a23-abfe-fe286b24b59a", "company":"d6038f2d-787c-427b-8eaf-4d9eea44a24a" }
Decorators is an array
Using: var stringJson = JArray.FromObject(deserialized_jsons).ToString(); using (var r = ChoJSONReader.LoadText(stringJson).ErrorMode(ChoErrorMode.IgnoreAndContinue)) { using (var w = new ChoParquetWriter(stream, new ChoParquetRecordConfiguration { CompressionMethod = Parquet.CompressionMethod.Snappy}) .ThrowAndStopOnMissingField(false) .ErrorMode(ChoErrorMode.IgnoreAndContinue)) { w.Write(r); } }
Can I have it be represented as: stype string decorators array<struct<Stype:string,FiscalInformation:struct<Stype:string,UUID:string,CFDIUse:string, etc... InstanceID string company string
instead of type string decorators_0_stype string decorators_0_fiscalinformation_stype string decorators_0_fiscalinformation_uuid string, etc...
I don't want a column for each property of each nested array, all of them separated by numbers. I want one column that contains all the elements of the nested array.
Is there a way to have the column be an array for search purposes? (e.g. when using Amazon Athena to query the file as a parquet file)
If I generate the parquet file with AWS Glue, it gives me the column as array