SheetJS / sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs
https://sheetjs.com/
Apache License 2.0
35.2k stars 7.99k forks source link

How to get data from xlsx as percentage ? #608

Closed mjdb3d closed 7 years ago

mjdb3d commented 7 years ago

Is it possible to get data from (xlsx) cell that formatted as percentage ? I've tried and I got numbers divided by 100.

xlsx file 12.23% readed data 0.12 Result required 12.23 or 12.23%

thanks in advance.

SheetJSDev commented 7 years ago

Can you share a file?

If you are reading from a file, the formatted text should be in the w field of the cell (the v field stores the raw numeric value)

mjdb3d commented 7 years ago

file.xlsx

#include "shim.js";
#include "jszip.js";
var JSZip = app.JSZip;
#include "xlsx.flow.js";

var jsx_path = File($.fileName).parent+'/';

var infile = File(jsx_path + 'file.xlsx');
infile.open("r");
infile.encoding = "binary";
var data = infile.read();
var workbook = XLSX.read(data, {type:"binary"});
var sheet_name = workbook.SheetNames[0];
var worksheet = workbook.Sheets[sheet_name];
var data = XLSX.utils.sheet_to_json(worksheet, {header:1});

$.writeln(data);

result: 0.01,0.22,-0,-0,0.25,-0,0.224,0.132,0.1755,0.704,,,,,,,,,,,,,,,,,,,,

SheetJSDev commented 7 years ago

@mjdb3d The extendscript support is relatively new and we haven't stress tested every feature. Thanks for reporting, we will look into this

mjdb3d commented 7 years ago

@SheetJSDev you’re welcome. I hope it will be fixed soon. Thank you.

carcinocron commented 7 years ago

This seems to be part of a larger problem (with apparently all excel readers) where the type information is available (I've read the source code), but for some unknown reason the data is abstracted away from the end-user and ignored for output. For example, for this spreadsheet in xlsx:

$40, 12%

I'm expecting one of these 2 outputs from sheet_to_json:

{[
  "$40",
  "12%",
]}
// or:
{[
  { type: "CURRENCY", value: "40" },
  { type: "PERCENT", value: "12" },
]}

The actual result is:

{[
  "40",
  "0.12",
]}

This also results in data-loss for sheet_to_csv.

Here we are checking the primary type t:

            switch(val.t){
                case 'e': continue;
                case 's': break;
                case 'b': case 'n': break;
                default: throw 'unrecognized type ' + val.t;
            }

but there is not any checking of the secondary type 's' (currency, percent, etc..)

SheetJSDev commented 7 years ago

@InstanceOfMichael can you share a file? I just threw together a sample file:

608.xlsx

The CSV and JSON does exactly what's expected. This is from http://oss.sheetjs.com/js-xlsx/ (just drag and drop into the box):

screen shot 2017-04-11 at 14 55 15

This is from node:

> var XLSX = require('xlsx');
> var wb = XLSX.readFile('608.xlsx');
> XLSX.utils.sheet_to_csv(wb.Sheets.Sheet1);
'$40 ,12%\n' <-- formatted text
> XLSX.utils.sheet_to_json(wb.Sheets.Sheet1, {header:1})
[ [ '$40 ', '12%' ] ] <-- formatted text
> XLSX.utils.sheet_to_json(wb.Sheets.Sheet1, {header:1, raw:true});
[ [ 40, 0.12 ] ] <-- raw numbers

The problem covered in this issue is extendscript not playing nice with the formatting functions. We've run into issues in the past with regular expressions, for example https://github.com/davidchambers/Base64.js/issues/31#issue-216606310 , but if you are using the web browser or node you should get the correct formatted data or raw data.

carcinocron commented 7 years ago

using { raw: false }, I am now getting this from my imaginary spreadsheet:

{[
  "40",
  "12%",
]}

Which is good enough for my purposes. Thank you!

SheetJSDev commented 7 years ago

That still looks wrong. Can you share the worksheet object (JSON.stringify(worksheet))? If the cell has formatted text (the cell object for 40 has a w key with value "$40 ") you should see the same text value in the output.

carcinocron commented 7 years ago

https://gist.github.com/InstanceOfMichael/220f5eb3b74ff3770c4d309ccf8be16f

SheetJSDev commented 7 years ago

@InstanceOfMichael thanks for sharing! I can reproduce, it's a bug in the number formatter.

At a high level, Excel doesn't distinguish between "currency" and "percent" and "date" and other cell types. There is a number cell type, and the number format dictates the interpretation.

If you want to see the number format, you can set cellNF:true in the options object. Then every cell will have a z key corresponding to the raw number format. If you look at the end of the output you'll see:

"164":"General",
"165":"[$$-409]#,##0.00;[RED]\\-[$$-409]#,##0.00",
"166":"0.00%"

Those are the formats that LibreOffice stored. Normally Excel would store a format like $#,##0.00 for the currency.

http://oss.sheetjs.com/ssf/ lets you test the formats:

using Excel's normal currency:

screen shot 2017-04-11 at 18 08 49

Using the LO currency format:

screen shot 2017-04-11 at 18 09 22
SheetJSDev commented 7 years ago

@mjdb3d @firas3d thanks for reporting! Finally figured out the issue, and it's a really really strange ExtendScript bug! Consider the following expression:

1 && 2 || 3 && 4 && 5 && 6

In JS this is interpreted as

(1 && 2) || (((3 && 4) && 5) && 6)     // --> 2

In ExtendScript Toolkit (you can test in the JavaScript console) and Photoshop/Illustrator the expression is interpreted as

((((1 && 2) || 3) && 4) && 5) && 6 // --> 6

The formatter library uses a complex expression of the same form:

o = c; while(++i < fmt.length && "0#?.,E+-%".indexOf(c=fmt.charAt(i)) > -1 || c=='\\' && fmt.charAt(i+1) == "-" && i < fmt.length - 2 && "0#".indexOf(fmt.charAt(i+2))>-1) o += c;

Since we use similar expressions throughout the codebases we have to make some more changes, but based on some simple tests I'm fairly confident that is the origin of the problem

praveen-wal commented 4 years ago

@SheetJSDev Still getting the same issue in the 0.15.6 version. Any idea what to do? @mjdb3d @InstanceOfMichael

anoop-chauhan commented 3 years ago

Hi @praveen-wal @mjdb3d @InstanceOfMichael , have you got any solution for this?

atfede commented 1 year ago

Hi, i'm still getting an error while trying to export as percentage. It's adding two extra zeros as tested on this tool:

https://oss.sheetjs.com/ssf/

For example, if you write any number as Percentage like 17 you get 1700%. Any help?

Thanks

PaulleDemon commented 1 year ago

The percentage seems to be preserved only if the raw is set to false.