Is your feature request related to a problem? Please describe.
We have a data export feature on our system that has a logging mechanism to show the end user how many records were processed/saved.
We fetch the data using a lazy IEnumerable method that fetches records from a database, then writes them using CsvWriter's WriteRecords<T> method.
However, the same implementation requires that we populate a separate field with the total number of records processed.
The process as a whole looks something like this:
public override async Task WriteDataAsync(Stream output)
{
var reportData = await this.database.GetReportXDataAsync();
using (var textWriter = new StreamWriter(output))
using (var csvWriter = new CsvWriter(textWriter, new CsvConfiguration(CultureInfo.InvariantCulture)))
{
await csvWriter.WriteRecordsAsync(reportData);
this.totalRecords = ??;
}
}
Since our method to retrieve the data leverages delayed execution, we avoid materializing it here and just pass it along to the writer, which iterates on it and writes all the records.
That's all good but we don't have a clean way of obtaining the total number of records written... we could potentially force materialization of the collection to avoid double enumeration of the IEnumerable, but that wastes memory unnecessarily.
Describe the solution you'd like
We'd like for the WriteRecords/WriteRecordsAsync family of methods to return the number of records actually written back to the caller instead of returning nothing.
This would allow callers to rely on this return value for notifying how many entries there were without being forced to enumerate the enumerable one more time to get the count, or materialize the collection upfront.
Describe alternatives you've considered
We've initially thought about just forcing the materialization of the list and dealing with the consequences, something like:
public override async Task WriteDataAsync(Stream output)
{
var reportData = await this.database.GetReportXDataAsync().ToListAsync();
using (var textWriter = new StreamWriter(output))
using (var csvWriter = new CsvWriter(textWriter, new CsvConfiguration(CultureInfo.InvariantCulture)))
{
await csvWriter.WriteRecordsAsync(reportData);
this.totalRecords = reportData.Count;
}
}
But because of how wasteful this approach is, we instead created an extension method on CsvHelper's WriteContext instead:
public static int GetWrittenRecordCount(this WritingContext writingContext)
{
// The Header row is not to be considered a record, but can be omitted.
var headerDeduction = writingContext.HasHeaderBeenWritten ? 1 : 0;
// 'Row' points to the beginning of the current line before writing, so we deduct 1 to get only the number of records.
return writingContext.Row - headerDeduction - 1;
}
However, we are not very happy with this solution since it relies on some assumptions such as that the Row value will always point to the line after the last written element, and that this will always work after a Flush call. It feels brittle and hard to explain.
Another possible solution we considered was just to not use WriteResults and write header and records manually:
using (var csvWriter = new CsvWriter(textWriter, new CsvConfiguration(CultureInfo.InvariantCulture)))
{
csvWriter.WriteHeader(typeof(OurReportType));
var writtenRecords = 0;
foreach (var record in reportData)
{
csvWriter.WriteRecord(record);
writtenRecords++;
}
this.totalRecords = writtenRecords;
}
But again this is a lot more verbose and manual: we'd like to keep this piece of code as simple as possible to parse.
We'd rather there was a native, robust option from the library itself that we could leverage directly.
Is your feature request related to a problem? Please describe. We have a data export feature on our system that has a logging mechanism to show the end user how many records were processed/saved.
We fetch the data using a lazy
IEnumerable
method that fetches records from a database, then writes them usingCsvWriter
'sWriteRecords<T>
method.However, the same implementation requires that we populate a separate field with the total number of records processed.
The process as a whole looks something like this:
Since our method to retrieve the data leverages delayed execution, we avoid materializing it here and just pass it along to the writer, which iterates on it and writes all the records.
That's all good but we don't have a clean way of obtaining the total number of records written... we could potentially force materialization of the collection to avoid double enumeration of the
IEnumerable
, but that wastes memory unnecessarily.Describe the solution you'd like We'd like for the
WriteRecords
/WriteRecordsAsync
family of methods to return the number of records actually written back to the caller instead of returning nothing.This would allow callers to rely on this return value for notifying how many entries there were without being forced to enumerate the enumerable one more time to get the count, or materialize the collection upfront.
This would change our implementation to:
Describe alternatives you've considered We've initially thought about just forcing the materialization of the list and dealing with the consequences, something like:
But because of how wasteful this approach is, we instead created an extension method on CsvHelper's WriteContext instead:
Which we then call like this:
However, we are not very happy with this solution since it relies on some assumptions such as that the
Row
value will always point to the line after the last written element, and that this will always work after aFlush
call. It feels brittle and hard to explain.Another possible solution we considered was just to not use
WriteResults
and write header and records manually:But again this is a lot more verbose and manual: we'd like to keep this piece of code as simple as possible to parse.
We'd rather there was a native, robust option from the library itself that we could leverage directly.