Closed JaseRandall closed 5 months ago
I think you have a variety of options here.
You can quite easily create your own GetRowValues extension method for ExcelDataReader:
static class ExcelDataReaderExtensions
{
static int GetRowValues(this ExcelDataReader csv, object[]? values)
{
var c = csv.RowFieldCount;
if (values == null) return c; // allow querying count
// throw because there isn't enough room, or alternately you could fill the space available?
if (values.Length < c) throw new ArgumentOutOfRangeException();
for (int i = 0; i < c; i++)
{
values[i] = csv.GetValue(i);
}
return c;
}
}
Or, the Sylvan.Data
nuget package provides a GetVariableField
extension method, which provides basically the same functionality. Here is a complete example using Sylvan.Data.Csv.CsvDataReader, but should work just as well with ExcelDataReader:
using Sylvan.Data.Csv;
using Sylvan.Data;
var data =
"""
this is a junk row
a,b,c
1,2,3
1,2,3,4
1,2,3
1,2
""";
var csv = CsvDataReader.Create(new StringReader(data));
csv.Read(); // skip the first junk row
// initialize using the headers on row 2
csv.Initialize();
// create a data reader that reads jagged rows via RowFieldCount
// any *extra* columns will be treated as type "string"
var jagged = csv.AsVariableField(r => r.RowFieldCount, typeof(string));
object[] values = new object[100];
while (jagged.Read())
{
var count = jagged.GetValues(values);
Console.WriteLine("Count: " + count);
for (int i = 0; i < count; i++)
{
Console.WriteLine(values[i]);
}
Console.WriteLine("---");
}
Which outputs:
Count: 3
1
2
3
---
Count: 4
1
2
3
4
---
Count: 3
1
2
3
---
Count: 2
1
2
---
In this case, you still don't know how many columns are in a row without inspecting RowFieldCount, so the values
object array might not be big enough.
Either way, I don't a modification to the library is needed to handle this scenario.
I have a .xlsx file that is a jagged array with column headings appearing on a row that is not row 1 (could be anywhere from row 5 to 50).
I see from Issue #114 that I can use reader.Initialize() to set the current row as the row with the column headers. Also, #157 has "Instead of: ed.FieldCount, use ed.RowFieldCount instead." which I use to create allocate the array fed into GetValues().
However, while looping to find the header row, I'm not able to view all the values on each row because GetValues() uses FieldCount rather than RowFieldCount.
It's easier to see using code and code comments:
So, I guess my question is, should ExcelDataReader.GetValues use FieldRowCount to determine how many times to loop filling data? Or should there be a specific GetRowValues method?