fsprojects / ExcelProvider

This library is for the .NET platform implementing a Excel type provider.
http://fsprojects.github.io/ExcelProvider/
The Unlicense
141 stars 51 forks source link

Feature request: read more rows to infer column type #56

Open jackfoxy opened 6 years ago

jackfoxy commented 6 years ago

Description

I don't know how many rows the TP reads before attempting to infer column types, but it is not enough. Type inference is frequently incorrect because the TP did not recognize (for instance) that a blank cell appears in the column, which should render it an option type, or that a decimal number appears further down a column, and so the type should not be int.

Conduct some experiments to come up with a greater number of rows consumed to do type inference, but not so many it impacts performance too much. More rows is better.

Known workarounds

ForceString - not a nice alternative

giuliohome commented 4 years ago

Please, this is also my issue. Can we get a bit more extended feedback here= Is it really the number of rows that the TP reads? Please - at least - clarify how many are they in the current release as it is, before suspending the whole request as an enhancement. Thank you very much for understanding.

Btw, as far as I can see, if a blank cell appears in a column and the other cells are int (for example) the inferred type is obj while it should be an int option (in this example)...

I share here my workaround based on TryGetValue and this helper:

let getOptionFromExcel (optional:obj) =
    match optional with
    | :? int as a -> Some (double a)
    | :? double as d -> Some d
    | _ -> None

as well as

let getDecimalOptionFromExcel : obj -> decimal option =
    getDoubleOptionFromExcel >> Option.map decimal

Usage examples

SizeTON = e.TryGetValue 5 "Size TON" |> getDecimalOptionFromExcel
SizeCbm = e.TryGetValue 6 "Size cbm" |> getDecimalOptionFromExcel
 QtyTons = 
     e.TryGetValue 5 "Size TON"
     |> getOptionFromExcel
     |> function
     | Some n -> decimal n
     | None -> 0m
     // instead of    
     // decimal e.``Size TON``; //https://github.com/fsprojects/ExcelProvider/issues/56
jackfoxy commented 4 years ago

@giuliohome when I looked, the number of lines was easy to find in the code. IIRC it was something like 25 or 50. I haven't used this TP in well over a year, so I have no current feedback. My memory is it could be a lot better. That said, even with improvements I have no plans to use it in the future.

giuliohome commented 4 years ago

Hi @jackfoxy thanks for your reply. It' strange, I have a blank after 3 or 4 lines and the field is interpreted as a float/double... and it throws an exception on the blank/row Will double check it again, if I overlooked anything or if I introduced the bug in my part of code (actually I like my "workaround" anyway...)

jackfoxy commented 4 years ago

@giuliohome I recall other typing issues. You probably found one I experienced. I filed 3 issues, all on the same day, which addressed the most egregious problems I was having. I think it was not long after this I dismantled all the Excel processing and banned Excel file submission in our company. That's how I solved the problem.

giuliohome commented 4 years ago

@jackfoxy thanks for sharing your experience and feel free to unsubscribe from the issue.

Really, I was writing to the maintainers of this project.