Cinchoo / ChoETL

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
MIT License
797 stars 134 forks source link

Only parses part of JSON to CSV #161

Closed mikaelollhage closed 2 years ago

mikaelollhage commented 2 years ago

Hi,

I have a situation where not my whole json gets parsed into csv-format.

Code

        var csvData = new StringBuilder();
        using (var jsonReader = ChoJSONReader.LoadText(jsonString))
        {
            using (var csvWriter = new ChoCSVWriter(csvData)
                .WithFirstLineHeader()
                .WithDelimiter(",")
                .QuoteAllFields()
                .Configure(c => c.UseNestedKeyFormat = true)
                )
            {
                csvWriter.Write(jsonReader);
            }
        }
        var finalCsvString = csvData.ToString();

I have an example json below containing 2 objects in an array. Only the first object gets parsed. In my real life example I have multiple objects under the ones supplied here. As long as I have the first object in the json, then that object is the only object that gets parsed. If I remove it, all other objects suddenly gets parsed flawlessly. So my guess is that it must contain som kind of escape?

For me it is unclear if it is the reader or the writer that fails.

[
  {
    "UnitId": "id1",
    "ProductNumber": "pn1",
    "SiteCode": "1",
    "SerialNumber": null,
    "Firmware": null,
    "FilePaths": [
      null
    ],
    "Orders": [
      {
        "OrderId": "oid1",
        "State": "completed",
        "OrderType": "order",
        "OrderSubtype": "porder",
        "ProductNumber": "pn1",
        "ProductLevelIn": "1",
        "ProductLevelOut": "2",
        "TestNameOut": "T1",
        "WorkSize": 1,
        "CreatedDatetime": "2021-01-28T09:52:45",
        "CompletedDatetime": "2021-01-28T10:57:29",
        "Firmware": "fw1",
        "TestRuns": [
          {
            "Passed": 0,
            "StationId": "st1",
            "OperatorId": "oi1",
            "StartedDatetime": "2021-01-28T10:23:06",
            "TestSystemVersion": "tsv1",
            "TestSeqVersion": "tsqv1",
            "TestStubVersion": "tsbv1",
            "ErrorCode": "ec1",
            "ErrorDescription": "ed1",
            "AdditionalErrorInfo": "",
            "Created": "2021-01-28T10:23:06",
            "TestEquipment": null
          }
        ]
      }
    ],
    "Subunits": [
      null
    ]
  },
  {
    "UnitId": "id1",
    "ProductNumber": "pn2",
    "SiteCode": "2",
    "SerialNumber": null,
    "Firmware": null,
    "FilePaths": [
      "fp2"
    ],
    "Orders": [
      {
        "OrderId": "oid2",
        "State": "completed",
        "OrderType": "order",
        "OrderSubtype": "rorder",
        "ProductNumber": "pn2",
        "ProductLevelIn": "1",
        "ProductLevelOut": "1",
        "TestNameOut": "T2",
        "WorkSize": 2,
        "CreatedDatetime": "2021-03-30T12:16:38",
        "CompletedDatetime": "2021-03-30T15:06:00",
        "Firmware": "",
        "TestRuns": [
          {
            "Passed": 0,
            "StationId": "st2",
            "OperatorId": "oi2",
            "StartedDatetime": "2021-03-30T13:26:26",
            "TestSystemVersion": "tsv1",
            "TestSeqVersion": "tsqv1",
            "TestStubVersion": "tsbv1",
            "ErrorCode": "ec2",
            "ErrorDescription": "ed2",
            "AdditionalErrorInfo": "aei2",
            "Created": "2021-03-30T08:26:26",
            "TestEquipment": [
              {
                "TeqId": "ti1",
                "TeqPartNumber": "tpn1",
                "TeqRevision": 1
              }
            ]
          }
        ]
      }
    ],
    "Subunits": [
      null
    ]
  }
]

Thank you kindly!

BR, MO

Cinchoo commented 2 years ago

Use .ThrowAndStopOnMissingField(false) on CSVWriter, as the 2 objects are having mixed structure.

Here is working sample: https://dotnetfiddle.net/5Ppv1t

mikaelollhage commented 2 years ago

Hi!

Thanks for getting back to me so quickly! Very impressive! I have looked at your suggestions. It does in fact now get me all rows, however it loses the properties not included in the first object - the three in the TestEquipment objects in this example (not in the header either).

Is there a way around this too?

Thanks again!

BR, MO

Cinchoo commented 2 years ago

Use .WithMaxScanRows(2) on CSVWriter to pick those nodes.

Updated working sample: https://dotnetfiddle.net/j8lmFA

mikaelollhage commented 2 years ago

Thanks again! This nuget truly is amazing compared to the others out there.

So, since this is a generic method with objects coming in from a database, I can never know which rows has which properties filled. So I guess I need to scan my whole list to be certain that I get all properties? Or is there a way to populate/write the header based on the model ?

Cinchoo commented 2 years ago

well, if you know the model of the message, you can construct the classes from them and use it for your parsing. This will eliminate the scanning process.

Otherwise, you have no choice to scan the entire list to capture all data.

mikaelollhage commented 2 years ago

Thank you my friend. It is a generic method that serves several completely different models in this case, so I will go aheadwith the full scan instead.

Thanks again for your amazing help!

BR, MO