fsprojects / ExcelProvider

This library is for the .NET platform implementing a Excel type provider.
http://fsprojects.github.io/ExcelProvider/
The Unlicense
141 stars 51 forks source link

No detection of column changes between compile-time and runtime #111

Open AverageHomosapien opened 2 months ago

AverageHomosapien commented 2 months ago

User Story

As a user, I want to compile a project against one excel file at compile time and then run it against another excel file at runtime. The second excel file potentially has a different ordering or number of columns.

Description

After compiling a project against one Excel file, when I run the project against a second file, the direct reference of a property by name pulls data from a column by index, while the .GetValue("Property") pulls data from the correctly named column. At runtime, the direct property reference should point towards the referenced column, rather than the column index of itself in the original workbook.

Repro steps

  1. Create a program that prints row.Column A, row.Column B and row.Column D with the following excel file columns:
Column A Column B Column D
A Value B Value D Value
  1. At runtime, point the following excel sheet at the program instead:
Column A Column B Column C Column D
A Value B Value C Value D Value
  1. The values printed will be "A Value", "B Value" and "C Value", even though the original project references column D, rather than column C - it seems to reference the original columns by position rather than by name
  2. (Note: if you ran the original program with row.GetValue("Column D"), rather than row.Column D, it would return "D Value"

Expected behavior

For the example above, it would be expected that the program prints out "A Value", "B Value" and "D Value", as we're still referencing the Column D from the original program

Actual behavior

For the example above, it actually prints out "A Value", "B Value" and "C Value"

Known workarounds

The known workaround is to use the .GetValue method instead of referencing the column directly, but then it defeats the purpose of the intellisense against each of the columns

Related information

AverageHomosapien commented 1 month ago

I have uploaded here rather than created an example branch as I do not have permission.

I have tested again with 3.0 and it is still an issue. Adding the following Excel book and running the test confirm the behaviour still exists: DataTypesReordered.xlsx

[<Test>]
let ``Recognises re-ordered columns at runtime``() =
    let getFirst (Choice1Of2 value) = value

    let compileTimeSheet = DataTypes().Data |> Seq.toArray |> Seq.head
    let compileTimeString = compileTimeSheet.String

    // This test is testing runtime behavior, so we need to use the FSI session to run the code
    let dataTypesReordered = __SOURCE_DIRECTORY__.Replace(@"\", "/") + "/DataTypesReordered.xlsx"
    let result, _ = 
        FsiTestContext.fsiSession.EvalInteractionNonThrowing $"""
        open FSharp.Interop.Excel
        type DataTypesReordered = ExcelFile<"{dataTypesReordered}", HasHeaders=true>
        DataTypesReordered()
        """

    let compiledSheetAtRuntime = unbox<DataTypes> <| getFirst(result).Value.ReflectionValue
    let runtimeRow = compiledSheetAtRuntime.Data |> Seq.toArray |> Seq.head
    let runtimeString = runtimeRow.String

    runtimeString |> should equal compileTimeString

When column ordering is changed - the ExcelProvider doesn't recognise the new column ordering. The test above returns the String2 column rather than String - even though it has successfully identified in the underlying provider that the column ordering has changed.

Upon further investigation, when TryGetValue is called the DesignTime.propertyImplementation definition, it grabs the value by Column Index rather than Column Name (which doesn't account for column re-ordering). One solution for it would be to change/overload the Definition of TryGetValue or add a new column to get the value by ColumnName. The following re-write of the TryGetValue method offers a potential solution and fixes the original test

member this.TryGetValueByColName<'a>(columnName: string) =
    let value = this.GetValue columnName

    try
        match value with
        | null -> value :?> 'a
        | _ when typeof<'a> = typeof<string> -> (box (string value)) :?> 'a
        | :? double as valueDbl when typeof<'a> = typeof<DateTime> -> (box (DateTime.FromOADate valueDbl)) :?> 'a
        | :? string as valueStr when typeof<'a> = typeof<DateTime> -> (box (DateTime.Parse valueStr)) :?> 'a
        | _ -> value :?> 'a
    with
    | :? InvalidCastException
    | :? ArgumentException
    | :? ArgumentNullException
    | :? InvalidCastException ->
        failInvalidCast value (value.GetType()) typeof<'a> columnName rowIndex documentId sheetname

Please let me know your thoughts :)

AverageHomosapien commented 1 month ago

If you want a suggestion of changes in the form of a MR, I am happy to push my suggestions if I am given permission