dfinke / ImportExcel

PowerShell module to import/export Excel spreadsheets, without Excel
https://www.powershellgallery.com/packages/ImportExcel/
Apache License 2.0
2.47k stars 400 forks source link

Export-Excel NoNumberConversion order #374

Closed DarkLite1 closed 6 years ago

DarkLite1 commented 6 years ago

Thanks for the great work on the module, it's really fantastic to see how it gets constantly improved. Special thanks to @jhoneill for implementing the speed optimizations.

One thing I noticed that changed in one of the latest commits is the following:

{ $_ -is [System.ValueType]} {
    # Save numerics, setting format if need be.
    $TargetCell.Value = $_
    if ($setNumformat) {$targetCell.Style.Numberformat.Format = $Numberformat }
    #Write-Verbose  "Cell '$Row`:$ColumnIndex' header '$Name' add value '$_' as value"
    break
}

{(($NoNumberConversion) -and ($NoNumberConversion -contains $Name)) -or
    ($NoNumberConversion -eq '*')} {
    #Save text without it to converting to number
    $TargetCell.Value = $_
    #Write-Verbose "Cell '$Row`:$ColumnIndex' header '$Name' add value '$($TargetCell.Value)' unconverted"
    break
}

Should the order not be reserved?

We've always had problems with numbers, decimals and (thousands) separator characters. So I think a lot of people (including us) are using the NoNumberConversion parameter quite heavily to make sure the number is saved as we see it in the console.

Thanks for having a look at this and keep up the good work 👍

jhoneill commented 6 years ago

First off thanks
Here's the logic I'm using.

  1. If something has type of datetime, put it in as a date.
  2. If has numeric or Boolean, type put it into the cell as that type.
  3. If we get to this point we either have a string or an object which has to be converted to a string to go into a cell. That string might contain the representation of a number. If number conversion is blocked, insert the value "as-is"
  4. If it is a string and it begins with = make it a formula
  5. If it is a URL make it a link (actually 4 and 5 could go before 3)
  6. ONLY THEN try to convert the object to a number (this is an expensive operation and we want to avoid it if we can) .

The "is a value type" test is quicker than "is conversion blocked" and a lot of the time the there will be nothing in $noNumberConversion, so I get better performance by saying "If it is a numeric type you don't need convert it, so don't worry about whether you allowed to "

There is a subtle change in behaviour, if you specify number format AND NoConversion and the data was a numeric type it would be inserted as a number but no formatting would be applied. I think this side-effect is negative, but someone who knew of it, and had one column of numbers they wanted left as general and want the rest to be formatted as percentages could use it to their advantage. However if you know that all your numbers are numeric types, and everything which is a string needs to stay as a string -NoNumberCoversion * will speed thing up, and you can format your numbers.

I think this is a better way, but if there are cases where things don't work the way people want I'm open to persuasion :-)

DarkLite1 commented 6 years ago

Thank you for the feedback, really appreciate the explanation. I'm clearly not so skilled as you are, so please bare with me as I did some small tests before upgrading our production environment to the latest and greatest.

A couple of things I noticed:

Exporting Formulas

When formulas, values starting with = are being exported when the switch -NoNumberConversion * is used. the formula is not saved as a Formula but rather as a String.

$ExcelParams = @{
    Path    = $env:TEMP + '\Excel.xlsx'
    Show    = $true
    Verbose = $true
}
Remove-Item -Path $ExcelParams.Path -Force -EA Ignore
[PSCustOmobject][Ordered]@{
    Date      = Get-Date
    Formula1  = '=SUM(F2:G2)'
    String1   = 'My String'
    String2   = 'a'
    IPAddress = '10.10.25.5'
    Number1   = '07670'
    Number2   = '0,26'
    Number3   = '1.555,83'
    Number4   = '1.2'
    Number5   = '-31'
    PhoneNr1  = '+32 44'
    PhoneNr2  = '+32 4 4444 444'
    PhoneNr3  =  '+3244444444'
} | Export-Excel @ExcelParams -NoNumberConversion *

image

Number conversions On a Windows Server 2012R2 we have the same Get-Culture value as on a Windows 10 Pro workstation:

PS H:\> Get-Culture

LCID             Name             DisplayName                                                                                              
----             ----             -----------                                                                                              
2067             nl-BE            Dutch (Belgium)   

But when we run the following code, we get different results for the Number4 property:

$ExcelParams = @{
    Path    = $env:TEMP + '\Excel Culture nl-BE on clinet.xlsx'
    Show    = $true
    Verbose = $true
}
Remove-Item -Path $ExcelParams.Path -Force -EA Ignore
[PSCustOmobject][Ordered]@{
    Date      = Get-Date
    Formula1  = '=SUM(F2:G2)'
    String1   = 'My String'
    String2   = 'a'
    IPAddress = '10.10.25.5'
    Number1   = '07670'
    Number2   = '0,26'
    Number3   = '1.555,83'
    Number4   = '1.2'
    Number5   = '-31'
    PhoneNr1  = '+32 44'
    PhoneNr2  = '+32 4 4444 444'
    PhoneNr3  =  '+3244444444'
} | Export-Excel @ExcelParams -NoNumberConversion PhoneNr1, PhoneNr2, PhoneNr3

image

Can you tell me what I'm doing wrong here? Because I've been breaking my head for a while over this culture thing and why it's different. It clearly has to do with the thousands or space indicators. But then again, the culture is the same on both systems. although they show different results.

jhoneill commented 6 years ago

So, first off I've moved the sequence round for the next build - it now goes

  1. If something has type of datetime, put it in as a date.
  2. If has numeric or Boolean, type put it into the cell as that type, and apply number formatting (if numformat is not the default)
  3. If it is a string and it begins with = make it a formula and apply number formatting if needed (previously formulas did not have number format applied)
  4. If it is a URL make it a link (actually 4 and 5 could go before 3)
  5. If number conversion is blocked, insert the value "as-is" 6.Try to convert strings to numbers before inserting them, if it is a number, apply number formatting if needed

With Dutch-Belgian …. it should treat "." as a misplaced thousand separator and the number becomes "twelve" If the language is wrong and set to English it will come back as "1 point 2" Which isn't what is happening. I simplified the call to [Double]::tryParse() in recent code but you can test 3 versions.

  1. Explicit , specific culture
    $be = [cultureinfo]::GetCultureInfo("nl-BE")  
    $number = $null ;
    if ( [Double]::TryParse([String]"1.2", [System.Globalization.NumberStyles]::Any,$be.NumberFormat, [Ref]$number)) {$number}
    else {"Didn't parse"} 

    On my Windows 10 Machine this gives "twelve" .

2.. Explicit current culture

$number = $null ;
 if ( [Double]::TryParse([String]"1.2", [System.Globalization.NumberStyles]::Any, [System.Globalization.NumberFormatInfo]::CurrentInfo, [Ref]$number)) {$number}
 else {"Didn't parse"} 

My UK machine gives "one point two" for this and "twelve" if I test "1,2"

  1. Implicit
    $number = $null ;
    if ( [Double]::TryParse( "1,2",   [Ref]$number)) {$number}
    else {"Didn't parse"} 

    which should be the same as a 2

It looks like your windows 10 machine doesn't allow a misplaced thousand separator; Dutch-Netherlands, French-Belgium and Dutch-Belgium (as well as English UK) all allow a thousand separator in the wrong place so … it's very odd

DarkLite1 commented 6 years ago

Thank you for the feedback. I tried your code and it returns the same result on the server as well as on the client. Although, as shown above, it appears that Excel shows different values depending on where the Excel file is generated.

God I hate decimal and thousand separator issues :( The language on both systems is set to English, both systems I type on a Qwerty (which has nothing to do with this but still) and the Culture is the same.

image

jhoneill commented 6 years ago

In the 3rd one in my reply it had become 1,2 - could you check that 1.2 in all 3 gives the same result

You could try uploading the files and I can poke at them and see if I can see what is wrong.

DarkLite1 commented 6 years ago

Sure thing, here's the code:

[Write-Host "Convert '1.2'" -ForegroundColor Yellow

$be = [cultureinfo]::GetCultureInfo("nl-BE")  
$number = $null ;
if ( [Double]::TryParse([String]"1.2", [System.Globalization.NumberStyles]::Any,$be.NumberFormat, [Ref]$number)) {$number}
else {"Didn't parse"} 

$number = $null ;
if ( [Double]::TryParse([String]"1.2", [System.Globalization.NumberStyles]::Any, [System.Globalization.NumberFormatInfo]::CurrentInfo, [Ref]$number)) {$number}
else {"Didn't parse"} 

$number = $null ;
if ( [Double]::TryParse( "1.2",   [Ref]$number)) {$number}
else {"Didn't parse"} 

Write-Host "Convert '1,2'" -ForegroundColor Yellow

$be = [cultureinfo]::GetCultureInfo("nl-BE")  
$number = $null ;
if ( [Double]::TryParse([String]"1,2", [System.Globalization.NumberStyles]::Any,$be.NumberFormat, [Ref]$number)) {$number}
else {"Didn't parse"} 

$number = $null ;
if ( [Double]::TryParse([String]"1,2", [System.Globalization.NumberStyles]::Any, [System.Globalization.NumberFormatInfo]::CurrentInfo, [Ref]$number)) {$number}
else {"Didn't parse"} 

$number = $null ;
if ( [Double]::TryParse( "1,2",   [Ref]$number)) {$number}
else {"Didn't parse"} ](url)

Here's the result on the server and on the client exactly the same: image

DarkLite1 commented 6 years ago

Adding files as requested Excel Culture nl-BE on client.xlsx Excel Culture nl-BE on server.xlsx

jhoneill commented 6 years ago

Good news. I don't know WHY you're getting differences but I can make it got away :-)

Search if Export.ps1 for [Double]::TryParse You'll find a comment on the next line which says "#was [double:: delete the new double and the #was . Right now it's failing to parse 1.2 :-) I'll back out the change in the code.

DarkLite1 commented 6 years ago

It probably has something to do with the regional settings, even though they are the same on both systems. Could it be that the EPPlus library is acting different on both systems with regards to the region? It would maybe be an option to feed Export-Excel with a -Culture parameter a region to use. So it's always correct... Dunno, bit out on a limp here.

Seems to be discussed on StackOverflow too. They suggest something like this:

[cultureinfo]::currentculture = 'de-DE'; 'val: {0}' -f 1.2; "val: $(1.2)"
 val: 1,2 # -f operator: GERMAN culture applies, where ',' is the decimal mark
 val: 1.2 # string interpolation: INVARIANT culture applies, where '.' is the decimal mark.
dfinke commented 6 years ago

@jhoneill Your merge to the upstream repo, https://github.com/dfinke/ImportExcel/pull/378, seems to have solved the conflicts. I'd like to merge and test, let me know if it is good to go or you want to add to add more updates to that PR or do another.

jhoneill commented 6 years ago

@dfinke - after various steps I seem to be in sync : I downloaded this repo, and did a comparison with all the files on my hard disk and the only which are now different are the ones I know I've worked on : that's

compare-worksheet.ps1
Export-Excel.ps1
Get XYRange.ps1 New-ExcelChart.ps1 readme.md and tests\Compare-WorkSheet.tests.ps1

I've reverted the ::tryParse (at least for the time being - I want to look at the perf implications), and changed the order of processing so =strings are processed as formulas before we get to noNumberConversion. (that's this thread) . Also addressed #376 but still need to write a test for that :-) and fixed the bugs in #377 and #375

@DarkLite1 - being able to specify the culture is not too difficult, and probably the right thing to do long term. I changed the string processing in the quest for speed, but I haven't done a check on how the speed now compares with the speed I had. @dfinke We might do something with the pester scripts so we can specify a path to the module so we say "test x look Yms on this version and Zms on that version".

dfinke commented 6 years ago

@jhoneill yup, like the idea of parameterizing what versions to try

DarkLite1 commented 6 years ago

Jup, a culture parameter for the target audience system would be helpful I think. On a side note, why is Write-Verbose slowing down the execution when the -Verbose switch is not provided? Seems it should just be ignored... Same goes for WRite-Debug, no?

On topic, I tried to write some Pester tests to verify the number conversions but I failed. It's just because ti will parce different results on different systems...

@dfinke I'll wait for updating our production code until the PowerShell Gallery is updated with the latest fix (revert) done by @jhoneill .

jhoneill commented 6 years ago

You'd think if verbose preference implies do nothing it would come back very quickly , but if you try $j = 0; foreach ($i in (1..100000)) {$j++; } It takes about 1/4 sec on my PC $j = 0; foreach ($i in (1..100000)) {$j++; Write-Verbose "Value is $J"} Takes about 8 seconds. It seems Write-Verbose writes to the verbose stream, and the verbose preference decides if it should create any output (near as makes no difference I get the same speed for debug and for warning if I set the action preference to 'Silently continue'; with the warning stream I can capture it in a variable so I don't think the write-does some work even when there will be no output.

The number parser from .NET defines a bit mask of what is allowed

   AllowLeadingWhite 1
  AllowTrailingWhite 2
    AllowLeadingSign 4
   AllowTrailingSign 8
    AllowParentheses 16
   AllowDecimalPoint 32
      AllowThousands 64
       AllowExponent 128
 AllowCurrencySymbol 256
   AllowHexSpecifier 512

And these are put together in combinations

                None 0
             Integer 7
              Number 111
               Float 167
            Currency 383
                 Any 511
           HexNumber 515

We use ANY which allows Local currency symbol, leading & Trailing spaces, allows the "-" to be leading or trailing or "()" to denote 'negative', and allows the Local thousand separator and decimal point (misplaced thousand separators are tolerated) "Any" excludes hex and doesn't support percentages. We also make no attempt to parse dates, or to preserve formatting. I'm going to try to add a language parameter today

DarkLite1 commented 6 years ago

On a side note, because of the formula not being saved as a formula, should this not be updated then in Export-Excel.Tests.ps1:

        it "Set a formula   in Cell B2                                                             " {
            $ws.Cells[2,2].Formula                                      | should     be  '=SUM(F2:G2)'        } 
jhoneill commented 6 years ago

No, that is there to make sure I have fixed the problem of the formula not being saved correctly. That test will throw an error on older versions and not on my latest version.

DarkLite1 commented 6 years ago

Perfect, thx for the heads up :)