airbreather / Cursively

A CSV reader for .NET. Fast, RFC 4180 compliant, and fault tolerant. UTF-8 only.
MIT License
38 stars 2 forks source link
c-sharp csharp csv csv-files csv-parser csv-parsing csv-reader csv-reading dotnet library netstandard20

Cursively

A fast, RFC 4180-conforming CSV reading library for .NET. Written in C#.

License CI (AppVeyor) NuGet MyGet (pre-release)
License CI NuGet MyGet

Documentation

Documentation is currently being published as GitHub Pages.

Usage

Create a subclass of CsvReaderVisitorBase (or one of its own built-in subclasses) with your own logic for processing the individual elements in order. Then, you have some options.

Example Visitor

public sealed class MyVisitor : CsvReaderVisitorBase
{
    private readonly Decoder _utf8Decoder = Encoding.UTF8.GetDecoder();

    private readonly char[] _buffer;

    private int _bufferConsumed;

    public MyVisitor(int maxFieldLength) =>
        _buffer = new char[maxFieldLength];

    public override void VisitPartialFieldContents(ReadOnlySpan<byte> chunk) =>
        VisitFieldContents(chunk, flush: false);

    public override void VisitEndOfField(ReadOnlySpan<byte> chunk) =>
        VisitFieldContents(chunk, flush: true);

    public override void VisitEndOfRecord() =>
        Console.WriteLine("End of fields for this record.");

    private void VisitFieldContents(ReadOnlySpan<byte> chunk, bool flush)
    {
        int charCount = _utf8Decoder.GetCharCount(chunk, flush);
        if (charCount + _bufferConsumed <= _buffer.Length)
        {
            _utf8Decoder.GetChars(chunk, new Span<char>(_buffer, _bufferConsumed, charCount), flush);
            _bufferConsumed += charCount;
        }
        else
        {
            throw new InvalidDataException($"Field is longer than {_buffer.Length} characters.");
        }

        if (flush)
        {
            Console.Write("Field: ");
            Console.WriteLine(_buffer, 0, _bufferConsumed);
            _bufferConsumed = 0;
        }
    }
}

Fastest

All of the other methods of processing the data are built on top of this, so it gives you the most control:

  1. Create a new instance of your visitor.
  2. Create a new instance of CsvTokenizer.
  3. Call CsvTokenizer.ProcessNextChunk for each chunk of the file.
  4. Call CsvTokenizer.ProcessEndOfStream after the last chunk of the file.

Example:

public static void ProcessCsvFile(string csvFilePath)
{
    var myVisitor = new MyVisitor(maxFieldLength: 1000);
    var tokenizer = new CsvTokenizer();
    using (var file = File.OpenRead(csvFilePath))
    {
        Console.WriteLine($"Started reading '{csvFilePath}'.");
        Span<byte> fileReadBuffer = new byte[4096];
        while (true)
        {
            int count = file.Read(fileReadBuffer);
            if (count == 0)
            {
                break;
            }

            var chunk = fileReadBuffer.Slice(0, count);
            tokenizer.ProcessNextChunk(chunk, myVisitor);
        }

        tokenizer.ProcessEndOfStream(myVisitor);
    }

    Console.WriteLine($"Finished reading '{csvFilePath}'.");
}

Simpler

  1. Create a new instance of your visitor.
  2. Use one of the CsvSyncInput or CsvAsyncInput methods to create an input object you can use to describe the data to your visitor.

Examples:

public static void ProcessCsvFile(string csvFilePath)
{
    Console.WriteLine($"Started reading '{csvFilePath}'.");
    CsvSyncInput.ForMemoryMappedFile(csvFilePath)
                .Process(new MyVisitor(maxFieldLength: 1000));
    Console.WriteLine($"Finished reading '{csvFilePath}'.");
}

public static void ProcessCsvStream(Stream csvStream)
{
    Console.WriteLine($"Started reading CSV file.");
    CsvSyncInput.ForStream(csvStream)
                .Process(new MyVisitor(maxFieldLength: 1000));
    Console.WriteLine($"Finished reading CSV file.");
}

public static async Task ProcessCsvStreamAsync(Stream csvStream)
{
    Console.WriteLine($"Started reading CSV file.");
    await CsvAsyncInput.ForStream(csvStream)
                       .ProcessAsync(new MyVisitor(maxFieldLength: 1000));
    Console.WriteLine($"Finished reading CSV file.");
}