MarkPflug / Sylvan.Data.Excel

The fastest .NET library for reading Excel data files.
MIT License
243 stars 30 forks source link

Values of some columns will be lost in special cases. #160

Closed lileyzhao closed 6 months ago

lileyzhao commented 7 months ago

Here is the sample code, using the file: table1.xlsx

{
    using ExcelDataReader edr = ExcelDataReader.Create("d://table1.xlsx", new ExcelDataReaderOptions { Schema = ExcelSchema.NoHeaders });
    edr.NextResult();
    edr.NextResult(); // to sheet3
    edr.Read();

    for (int i = 0; i < edr.FieldCount; i++)
    {
        var value = edr.GetString(i);
        Console.Write($"[{i}]{value}\t");
    }
}

Console.WriteLine($"\n\n{new string('-', 80)}");

{
    using ExcelDataReader edr = ExcelDataReader.Create("d://table1.xlsx");
    edr.NextResult();
    edr.NextResult(); // to sheet3
    // edr.Read(); // no read() like do-while

    for (int i = 0; i < edr.FieldCount; i++)
    {
        var value = edr.GetString(i);
        Console.Write($"[{i}]{value}\t");
    }
}

Now, Please compare the output with the columns in Sheet3 of the file.

// NoHeaders
[0]type  [1]orderno  [2]status  [3]time  [4]province  [5]num  [6]all_num  [7]     [8]     [9]     [10]

--------------------------------------------------------------------------------
// Default, But not read()
[0]type  [1]orderno  [2]status  [3]time  [4]province  [5]num  [6]all_num  [7]weight  [8]volume  [9]size  [10]like

The issue is that when reading the first row of Sheet3 in NoHeaders mode, null values are retrieved starting from column H. Through my experiments, I found that this is influenced by Sheet2, as Sheet2 only goes up to column G, thereby affecting Sheet3 to read only up to column G. If the content of Sheet2 is deleted or Sheet2 is removed entirely, the values from column H onwards can be correctly read.

I just identified the issue but I don't know how this strange problem occurred.

Actually, I am in the process of writing an Excel strongly-typed reading library based on Sylvan.Data.Excel because the strong typing support of Sylvan.Data.Excel does not meet my requirements. While developing this library, I encountered this issue because I need complete control over columns and data processing, necessitating the use of the NoHeaders mode for reading. However, I found that in the above situation, using NoHeaders mode results in losing values after column H.

Considering the previous issues, you now understand why I attempted to use the do-while approach for reading, as I discovered that do-while can indeed read values beyond column H.

lileyzhao commented 7 months ago

Now I have given up on using the do-while approach forcefully in the strongly-typed reading library I am developing to address this issue, as the do-while approach is not standard. However, I should provide feedback on this issue to you.

MarkPflug commented 7 months ago

I've identified the issue and implemented a fix in a pre-release package: https://www.nuget.org/packages/Sylvan.Data.Excel/0.4.20-b0001 Let me know if this resolves the issue for you. I'm going to also try to address #159 before pushing a final 0.4.20 build.

lileyzhao commented 7 months ago

The version 0.4.20-b0001 indeed resolved my issue, so I no longer have problem #149. When will the official release be available on NuGet?

MarkPflug commented 6 months ago

This is fixed in 0.4.20, just pushed to nuget.