JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.65k stars 1.05k forks source link

Return number of written records in 'WriteRecords' family of methods #2168

Open julealgon opened 1 year ago

julealgon commented 1 year ago

Is your feature request related to a problem? Please describe. We have a data export feature on our system that has a logging mechanism to show the end user how many records were processed/saved.

We fetch the data using a lazy IEnumerable method that fetches records from a database, then writes them using CsvWriter's WriteRecords<T> method.

However, the same implementation requires that we populate a separate field with the total number of records processed.

The process as a whole looks something like this:

public override async Task WriteDataAsync(Stream output)
{
    var reportData = await this.database.GetReportXDataAsync();

    using (var textWriter = new StreamWriter(output))
    using (var csvWriter = new CsvWriter(textWriter, new CsvConfiguration(CultureInfo.InvariantCulture)))
    {
        await csvWriter.WriteRecordsAsync(reportData);
        this.totalRecords = ??;
    }
}

Since our method to retrieve the data leverages delayed execution, we avoid materializing it here and just pass it along to the writer, which iterates on it and writes all the records.

That's all good but we don't have a clean way of obtaining the total number of records written... we could potentially force materialization of the collection to avoid double enumeration of the IEnumerable, but that wastes memory unnecessarily.

Describe the solution you'd like We'd like for the WriteRecords/WriteRecordsAsync family of methods to return the number of records actually written back to the caller instead of returning nothing.

This would allow callers to rely on this return value for notifying how many entries there were without being forced to enumerate the enumerable one more time to get the count, or materialize the collection upfront.

This would change our implementation to:

    this.totalRecords = await csvWriter.WriteRecordsAsync(reportData);

Describe alternatives you've considered We've initially thought about just forcing the materialization of the list and dealing with the consequences, something like:

public override async Task WriteDataAsync(Stream output)
{
    var reportData = await this.database.GetReportXDataAsync().ToListAsync();

    using (var textWriter = new StreamWriter(output))
    using (var csvWriter = new CsvWriter(textWriter, new CsvConfiguration(CultureInfo.InvariantCulture)))
    {
        await csvWriter.WriteRecordsAsync(reportData);
        this.totalRecords = reportData.Count;
    }
}

But because of how wasteful this approach is, we instead created an extension method on CsvHelper's WriteContext instead:

public static int GetWrittenRecordCount(this WritingContext writingContext)
{
    // The Header row is not to be considered a record, but can be omitted.
    var headerDeduction = writingContext.HasHeaderBeenWritten ? 1 : 0;

    // 'Row' points to the beginning of the current line before writing, so we deduct 1 to get only the number of records.
    return writingContext.Row - headerDeduction - 1;
}

Which we then call like this:

    await csvWriter.WriteRecordsAsync(reportData);
    this.totalRecords = csvWriter.Context.GetWrittenRecordCount();

However, we are not very happy with this solution since it relies on some assumptions such as that the Row value will always point to the line after the last written element, and that this will always work after a Flush call. It feels brittle and hard to explain.

Another possible solution we considered was just to not use WriteResults and write header and records manually:

using (var csvWriter = new CsvWriter(textWriter, new CsvConfiguration(CultureInfo.InvariantCulture)))
{
    csvWriter.WriteHeader(typeof(OurReportType));
    var writtenRecords = 0;
    foreach (var record in reportData)
    {
        csvWriter.WriteRecord(record);
        writtenRecords++;
    }

    this.totalRecords = writtenRecords;
}

But again this is a lot more verbose and manual: we'd like to keep this piece of code as simple as possible to parse.

We'd rather there was a native, robust option from the library itself that we could leverage directly.