ashtuchkin / iconv-lite

Convert character encodings in pure javascript.
MIT License
3.08k stars 282 forks source link

Encoding win1251 #233

Closed themrxcm closed 4 years ago

themrxcm commented 4 years ago

Hi! Tell me, please, is it possible to translate from windows-1251 to utf-8 such JSON TEST": { "CODE": 1, "NAME": "����", }, The Atom text editor uses your package and they can do it, but I couldn't find an implementation in their code. Thanks! Sorry for my bad English.

ashtuchkin commented 4 years ago

Both win1251 and utf8 are supported in iconv-lite. To convert binary data in win1251 encoding to a js string, use iconv.decode(bug, "win1251"). To convert js string to utf8 data, either use iconv.encode(str, "utf8"), or just use the string directly (utf8 is the default encoding and node.js will convert it for you)

On Tue, May 19, 2020, 07:53 Aleshkovskiy notifications@github.com wrote:

Hi! Tell me, please, is it possible to translate from windows-1251 to utf-8 such JSON TEST": { "CODE": 1, "NAME": "����", }, The Atom text editor uses your package and they can do it, but I couldn't find an implementation in their code. Thanks! Sorry for my bad English.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHPNDJULY6EE2BYX3UDRSJXJ3ANCNFSM4NE5ZZGQ .

themrxcm commented 4 years ago

My example: let str = iconv.decode(buff, "windows-1251"); let utf8 = iconv.encode(str, "utf8"); console.log(utf8.toString()) In console: "TEST": { "CODE": 1, "NAME": "пїЅпїЅпїЅпїЅ", } It should be: "TEST": { "CODE": 1, "NAME": "Тест", // Russian language }

ashtuchkin commented 4 years ago

Also note that you likely need to decode data before parsing json. If you have already parsed it, it's too late to do decoding.

On Tue, May 19, 2020, 08:24 Alexander Shtuchkin ashtuchkin@gmail.com wrote:

Both win1251 and utf8 are supported in iconv-lite. To convert binary data in win1251 encoding to a js string, use iconv.decode(bug, "win1251"). To convert js string to utf8 data, either use iconv.encode(str, "utf8"), or just use the string directly (utf8 is the default encoding and node.js will convert it for you)

On Tue, May 19, 2020, 07:53 Aleshkovskiy notifications@github.com wrote:

Hi! Tell me, please, is it possible to translate from windows-1251 to utf-8 such JSON TEST": { "CODE": 1, "NAME": "����", }, The Atom text editor uses your package and they can do it, but I couldn't find an implementation in their code. Thanks! Sorry for my bad English.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHPNDJULY6EE2BYX3UDRSJXJ3ANCNFSM4NE5ZZGQ .

ashtuchkin commented 4 years ago

In your example you're encoding utf8 twice. Once using iconv-lite, second time when you do buf.toString(). Try console.log(str)?

On Tue, May 19, 2020, 08:36 Alexander Shtuchkin ashtuchkin@gmail.com wrote:

Also note that you likely need to decode data before parsing json. If you have already parsed it, it's too late to do decoding.

On Tue, May 19, 2020, 08:24 Alexander Shtuchkin ashtuchkin@gmail.com wrote:

Both win1251 and utf8 are supported in iconv-lite. To convert binary data in win1251 encoding to a js string, use iconv.decode(bug, "win1251"). To convert js string to utf8 data, either use iconv.encode(str, "utf8"), or just use the string directly (utf8 is the default encoding and node.js will convert it for you)

On Tue, May 19, 2020, 07:53 Aleshkovskiy notifications@github.com wrote:

Hi! Tell me, please, is it possible to translate from windows-1251 to utf-8 such JSON TEST": { "CODE": 1, "NAME": "����", }, The Atom text editor uses your package and they can do it, but I couldn't find an implementation in their code. Thanks! Sorry for my bad English.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHPNDJULY6EE2BYX3UDRSJXJ3ANCNFSM4NE5ZZGQ .

themrxcm commented 4 years ago

Yes, you are right! I may need to specify the encoding when using fs.readFile(). But of all the possible options there is no suitable one for me ): I will look for a solution further. The problem appears earlier, but your package is working correctly

console.log(str) "TEST": { "CODE": 1, "NAME": "пїЅпїЅпїЅпїЅ", }

As I find a solution I will write a comment

Thanks!

themrxcm commented 4 years ago

Sorry, my inattention failed me. I converted a corrupted file that had a wrong encoding. If you need an example here it is:

const autoenc = require('node-autodetect-utf8-cp1251-cp866');
const fs = require('fs');
const Buffer = require('buffer').Buffer;
const iconv = require('iconv-lite');

convert('./test.json')

function convert(path) {
  const data = fs.readFileSync(path);
  let buff = new Buffer.from(data);
  const encoding = autoenc.detectEncoding(buff).encoding
  let str = iconv.decode(buff, encoding || 'utf8');
  let utf8 = iconv.encode(str, "utf8");
  fs.writeFileSync('./testw.json', utf8)
  return utf8;
}

Your suggestion about re encoding helped me a lot! Thanks!

ashtuchkin commented 4 years ago

Awesome, glad you resolved the issue!

On Tue, May 19, 2020, 09:20 Aleshkovskiy notifications@github.com wrote:

Sorry, my inattention failed me. I converted a corrupted file that had a wrong encoding. If you need an example here it is:

const autoenc = require('node-autodetect-utf8-cp1251-cp866'); const fs = require('fs'); const Buffer = require('buffer').Buffer; const iconv = require('iconv-lite');

convert('./test.json')

function convert(path) { const data = fs.readFileSync(path); let buff = new Buffer.from(data); const encoding = autoenc.detectEncoding(buff).encoding let str = iconv.decode(buff, encoding || 'utf8'); let utf8 = iconv.encode(str, "utf8"); fs.writeFileSync('./testw.json', utf8) return utf8; }

Your suggestion about re encoding helped me a lot! Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/233#issuecomment-630812578, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHKOGKNAWF7MCEYJOADRSKBQPANCNFSM4NE5ZZGQ .