Cinchoo / ChoETL

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
MIT License
802 stars 134 forks source link

Support for Altering Header Structures within a JSON #106

Open JoelJuntunen opened 4 years ago

JoelJuntunen commented 4 years ago

Hi!

I am working with a JSON file that holds multiple varieties of included headers (Example below). Is there a configuration support to include headers that do not appear in the first record? I need my tool to be usable for all JSON files so I do not want to define the structure in code.

`Input: [ { "column_a": 1, "column_b": 2, "column_c": 3 }, { "column_a": 11, "column_x": "not present in first item", "column_c": 33 } ]

Output: column_a;column_b;column_c 1;2;3; 11;?;33`

Desired output: column_a;column_b;column_x;column_c 1;2;?;3; 11;?;not present in first item;33`

Cinchoo commented 4 years ago

Use WithMaxScanRows on CSV writer.

JoelJuntunen commented 4 years ago
            using (var w = new ChoCSVWriter(tempFilePath)
                .WithFirstLineHeader()
                .WithMaxScanRows(2)
                )
            {
                w.Write(r);
            }

If this is what you mean it produces identical output to before.

Cinchoo commented 4 years ago

Here you go, the working sample

string json = @"[
{
""column_a"": 1,
""column_b"": 2,
""column_c"": 3
},
{
""column_a"": 11,
""column_x"": ""not present in first item"",
""column_c"": 33
}
]";

StringBuilder csv = new StringBuilder();
using (var r = ChoJSONReader.LoadText(json)
    .UseJsonSerialization()
    )
{
    using (var w = new ChoCSVWriter(csv).WithFirstLineHeader()
        .ThrowAndStopOnMissingField(false)
        .WithMaxScanRows(2))
    {
        w.Write(r);
    }
}

Console.WriteLine(csv.ToString());
JoelJuntunen commented 4 years ago

This works, thank you! Is this documented somewhere? I couldn't find it when researching the issue.

Cinchoo commented 4 years ago

https://www.codeproject.com/Articles/5268371/Cinchoo-ETL-JSON-Reader

JoelJuntunen commented 4 years ago

The document doesn't seem to include a mention of the .WithMaxScanRows() that was necessary for this structure (that doesn't have all fields in the first record).