EPPlusSoftware / EPPlus

EPPlus-Excel spreadsheets for .NET
https://epplussoftware.com
Other
1.82k stars 277 forks source link

EPPlus ExcelRangeBase ToDataTable() crashes if worksheet has any duplicate column names. #1677

Open jsnyder324 opened 1 week ago

jsnyder324 commented 1 week ago

EPPlus usage

Personal use

Environment

Windows

Epplus version

7.4.2

Spreadsheet application

Excel

Description

I'm evaluating this package for potential commercial use but I cannot use it with this following issue: ExcelRangeBase::ToDataTable(..) will crash if the range has any duplicate column names. Documentation says that this same issue was fixed in v5.0.3 and that EPPlus will append unique numbers on the end of duplicate column names but that is not happening. Also the ColumnNameParsingStrategy /NameParsingStrategy options have no option to rename or ignore or get around this issue.

error message: "Duplicate column name :".....

    //
    // Summary:
    //     Returns the range as a System.Data.DataTable with the option supplied.
    //
    // Parameters:
    //   configHandler:
    //     Configures the settings used to convert the range.
    //
    // Returns:
    //     A System.Data.DataTable representing the range.
    public DataTable ToDataTable(Action<ToDataTableOptions> configHandler);

Options Used: FirstRowIsColumnNames = true; AlwaysAllowNull = true;

swmal commented 6 days ago

If your column names in the worksheet are not unique you can set the FirstRowIsColumnNames to false on the ToDataTableOptions instance and then set up your DataTable (i.e. define the columns) yourself before you call ToDataTable.

Use this example, but set FirstRowIsColumnNames to false instead: https://github.com/EPPlusSoftware/EPPlus/wiki/ToDataTable#use-an-existing-datatable

I will also have a look at this function and see if we can improve the default behaviour.

swmal commented 6 days ago

In the next version we will add a new AllowDuplicateColumnNames parameter to ToDataTableOptions. If you set this to true EPPlus will name duplicate column names in the worksheet to "{name}1", "{name}2", etc.

Here is how to solve this with the current version of EPPlus. What you need to do is to add a mapping for each duplicate column name and give it a unique name.

image

[TestMethod]
public void ToDataTableDuplicateColumnNames()
{
    var package = OpenTemplatePackage("ToDataTableDuplicateColumnNames.xlsx");
    var sheet = package.Workbook.Worksheets[0];
    var dt = sheet.Cells["A1:C3"].ToDataTable(x =>
    {
        x.FirstRowIsColumnNames = true;
        x.Mappings.Add(1, "Name_1");
        x.Mappings.Add(2, "Name_2");
    });
    // rows and column count
    Assert.AreEqual(2, dt.Rows.Count);
    Assert.AreEqual(3, dt.Columns.Count);
    // column names
    Assert.AreEqual("Id", dt.Columns[0].ColumnName);
    Assert.AreEqual("Name_1", dt.Columns[1].ColumnName);
    Assert.AreEqual("Name_2", dt.Columns[2].ColumnName);
    // row values
    Assert.AreEqual(1d, dt.Rows[0][0]);
    Assert.AreEqual("Name 1", dt.Rows[0][1]);
    Assert.AreEqual("Name 2", dt.Rows[0][2]);
}