SheetJS / sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs
https://sheetjs.com/
Apache License 2.0
35.02k stars 8k forks source link

xlsx CP932 Incorrect output #2799

Closed yanghoxom closed 2 years ago

yanghoxom commented 2 years ago

This is my file: test.xlsx This is my code:

cosnt wb = XLSX.read(await file.arrayBuffer(), { cellText: false, cellDates: true });

the output for Japanese: *ƒ[ƒ‹ƒAƒhƒŒƒX and A-Fƒ‰ƒ“ƒN, very strange it work well if I change to csv

SheetJSDev commented 2 years ago

The linked file is a CSV, not an XLSX file.

If you want to use CP932, you have to specify it:

cosnt wb = XLSX.read(await file.arrayBuffer(), { cellText: false, cellDates: true, codepage:932 });

If you are using ESM, you need to manually load the encodings. https://docs.sheetjs.com/docs/getting-started/installation/nodejs#usage describes the configuration in more detail.

yanghoxom commented 2 years ago

@SheetJSDev thank you but sometimes, I'm not sure about the input file's encoding so set codepage: 932 as default is quite risk, have any way to make it auto-detect or I need to implement that logic? cptable is able to do it?

SheetJSDev commented 2 years ago

Unfortunately "auto-detecting encoding" is out of scope. Excel automatically applies the computer regional settings when reading files. That is analogous to the codepage option.