cfsimplicity / spreadsheet-cfml

Standalone library for working with spreadsheets and CSV in CFML
MIT License
127 stars 36 forks source link

Reading into a query converts numbers with more than 5 decimal places to scientific notation #305

Closed JamoCA closed 1 year ago

JamoCA commented 1 year ago

I'm using 3.4.4. (Not sure if an update is needed for this or not.)

I imported an XLSX spreadsheet with the following decimal values (5 places) and they were all converted to scientific notation. I used queryColumnTypes and configured them to all be treated as varchar, but it didn't seem to make any difference. (I need to use varchar since some rows contain text values rather than numeric.)

Lap1 Lap2 Lap3
0.00001 0.00002 0.00008
DNF DQ NS

I used an third-party app to manually convert all values in the spreadsheet to text by prefixing numeric values with an apostrophe ('), but I don't know if that should be a required step to retain accuracy. I'd hate to forget this step or have to teach it to someone else and trust that they always remember.

Is it possible to read these decimal values with being converted to scientific notation. (NOTE: I never want to use scientific notation.)

Thanks.

cfsimplicity commented 1 year ago

Hi James. It seems to be a problem with ACF. If I run this with Lucee I get the expected result.

spreadsheet.newChainable( "xlsx" )
  .setCellValue( "0.00001", 1, 1, "numeric" )
  .formatCell( { dataformat: "0.00000" }, 1, 1 )
  .write( path, true );
result = spreadsheet.read( src=path, format="query" );
WriteDump( result );

20221110

But with any version of ACF I get the scientific notation for a number with more than 4 decimal places.

20221110-acf

I'll try and track down where and why this is happening.

cfsimplicity commented 1 year ago

It seems to be down to how ACF likes to output doubles. Lucee will also do it given enough decimal places.

I've made a change which should fix this by ensuring it uses the value as formatted in the workbook.

Can you please test it using the current develop branch?

JamoCA commented 1 year ago

It works. I haven't tested it on any files beyond the one that I'm testing, but all visual values are imported.

The values in Excel were initially percentages and I've always had difficulty with them, so it seemed better to remove the filter and use "general" or "text" formatting.

I tried a spreadsheet with values formatted as a "percent" w/hidden decimal places. Only the visible values + percentage symbol were imported. If I extended the cosmetic decimal place of the percentage visually in Excel to 5 places, then the extra decimal places were imported. (Sometimes Excel users are surprised that greater accuracy exists in the cells without realizing that they are only cosmetically hidden and full values can impact their formulas.)

I set the columnar formatting to "general" & "numeric" and all visible values were correctly read & outputted in the returned query object. NOTE: I performed these tests without any queryColumnTypes parameter configured.

Do you think this setting is "safe" for all read operations or are you considering adding this as a toggleable option? (I prefer "general" formatting and normally don't like importing formatted data other than "dates".)

cfsimplicity commented 1 year ago

@JamoCA Thanks for the feedback.

Do you think this setting is "safe" for all read operations[...]?

On reflection, perhaps not. Unfortunately the library is inconsistent in the way it returns cell values: getCellValue() returns the visible (formatted) value whereas read() returns the raw value ignoring any formatting. They should probably both return the raw value with an option to return the visible value.

I've decided to implement that for read()...

https://github.com/cfsimplicity/spreadsheet-cfml/issues/309

...and getCellValue() but keep the default visible value for getCellValue() to avoid a breaking change:

https://github.com/cfsimplicity/spreadsheet-cfml/issues/308

So to address your issue with decimals, you now need to set returnVisibleValues=true on your read() call.

Would you mind again testing that out with develop?

JamoCA commented 1 year ago

The flag works correctly.

When returnVisibleValues is true, the visibly formatted values in the cell are consistent with the query object. If flag is false (or not set; default), the percentage-formatted values are converted to decimal (and some values are represented using scientific notation).