ExcelDataReader.GetValues() uses FieldCount rather than RowFieldCount

MarkPflug / Sylvan.Data.Excel

The fastest .NET library for reading Excel data files.

MIT License

243 stars 30 forks source link

// Here's an extract of my code based on the documentation using (var reader = ExcelDataReader.Create(filename)) { do { while (reader.Read()) { if (reader.RowNumber < 9) continue; // values shown by the debugger: // reader.FieldCount == 3 // reader.RowFieldCout == 7 object[] arrayOfValues = new object[reader.RowFieldCount]; reader.GetValues(arrayOfValues); // arrayOfValues has 7 fields however, GetValues() only fills the first 3 fields with data // This is because GetValues() uses FieldCount rather than RowFieldCount (see below) } } while (reader.NextResult()); } // Extract from Sylvan.Data.Excel.ExcelDataReader public abstract partial class ExcelDataReader : DbDataReader, IDisposable, IDbColumnSchemaGenerator { public sealed override int GetValues(object[] values) { ValidateAccess(); // should the next line use RowFieldCount rather than FieldCount? // considering if someone wants to only use FieldCount, they can initialise the array size using FieldCount // and then we don't need an additional method to use rowFieldCount var c = Math.Min(values.Length, this.FieldCount); for (int i = 0; i < c; i++) { values[i] = this.GetValue(i); } return c; } // Alternatively, should I submit a pull request containing a method for GetRowValues? public int GetRowValues(object[] values) { ValidateAccess(); var c = Math.Min(values.Length, this.rowFieldCount); for (int i = 0; i < c; i++) { values[i] = this.GetValue(i); } return c; } }

I think you have a variety of options here.

You can quite easily create your own GetRowValues extension method for ExcelDataReader:

static class ExcelDataReaderExtensions
{
    static int GetRowValues(this ExcelDataReader csv, object[]? values)
    {
        var c = csv.RowFieldCount;
        if (values == null) return c; // allow querying count

        // throw because there isn't enough room, or alternately you could fill the space available?
        if (values.Length < c) throw new ArgumentOutOfRangeException(); 

        for (int i = 0; i < c; i++)
        {
            values[i] = csv.GetValue(i);
        }
        return c;
    }
}

Or, the Sylvan.Data nuget package provides a GetVariableField extension method, which provides basically the same functionality. Here is a complete example using Sylvan.Data.Csv.CsvDataReader, but should work just as well with ExcelDataReader:

using Sylvan.Data.Csv;
using Sylvan.Data;

var data =

    """
    this is a junk row
    a,b,c
    1,2,3
    1,2,3,4
    1,2,3
    1,2
    """;

var csv = CsvDataReader.Create(new StringReader(data));
csv.Read(); // skip the first junk row

// initialize using the headers on row 2
csv.Initialize();

// create a data reader that reads jagged rows via RowFieldCount
// any *extra* columns will be treated as type "string"
var jagged = csv.AsVariableField(r => r.RowFieldCount, typeof(string));

object[] values = new object[100];
while (jagged.Read())
{
    var count = jagged.GetValues(values);
    Console.WriteLine("Count: " + count);
    for (int i = 0; i < count; i++)
    {
        Console.WriteLine(values[i]);
    }
    Console.WriteLine("---");
}

Which outputs:

Count: 3
1
2
3
---
Count: 4
1
2
3
4
---
Count: 3
1
2
3
---
Count: 2
1
2
---

In this case, you still don't know how many columns are in a row without inspecting RowFieldCount, so the values object array might not be big enough.

Either way, I don't a modification to the library is needed to handle this scenario.

MarkPflug / Sylvan.Data.Excel

ExcelDataReader.GetValues() uses FieldCount rather than RowFieldCount #174