ashtuchkin / iconv-lite

Convert character encodings in pure javascript.
MIT License
3.04k stars 282 forks source link

Time to upgrade to modern syntax and become ESM only? #273

Open jimmywarting opened 2 years ago

jimmywarting commented 2 years ago

ppl are starting to use import instead it's more cross compatible for both browsers, Deno and Node's new experimental http import so you won't depend on npm

lots of folks want stuff to be more cross compatible between different environment. That means using import and ditching Buffer for simple Uint8Array instead.

Also how about switching to using TextDecoder instead? - it's available in Deno, browser and node globally

yosion-p commented 2 years ago

HI,bro, I think you mentioned two points:

  1. As for ESM, it seems not too difficult. I will submit a PR this week if I have time;
  2. As for TextDecoder, I don't think its compatibility is very good, it need the full ICU data. Maybe we can introduce TextDecoder on the existing code? @ashtuchkin
jimmywarting commented 2 years ago

just though of something else. What if instead of using node:streams you used (async) iterators instead

// returns a new iterable (generator) const iterableTransformer = iconv.decodeIterableSync(iterable) for (let str of iterableTransformer) result += str

str = [...iconv.decodeIterableSync(iterable)].join('')

// returns a new async iterable (async generator) const iterableTransformer = iconv.decodeIterable(fs.createReadStream('./file.txt')) for await (let str of iterableTransformer) result += str

stream.Readable.from(iterableTransformer).pipeTo(dest)


think this would be best for cross compatibility
yosion-p commented 2 years ago

hi,i upgraded it ro es6 @ashtuchkin

ashtuchkin commented 2 years ago

Hey first of all thanks for the feature requests. Let me address them one by one.

  1. ES Modules - correct me if I'm wrong, but this is not required for cross-compatibility between Node, Browser and Deno, is it? You could use regular CommonJS in browser for quite some time using Browserify or similar libraries. Deno also seems to support it. Maybe it's not as ergonomic as ES Modules, sure, but it seems more like a tradeoff we need to consider, with supporting older Node versions on the other end (specifically versions less than v15.3.0, v12.22.0, which are pretty recent afaik). I'll definitely keep that in mind, but I'm not sure I'm ready to do this leap at this point.

  2. Buffer -> UInt8Array. This one I agree. It was also requested by vscode team. I'm working on it in next branch, but it's a bit slow. It'll get there at some point.

  3. TextDecoder. This one is a bit tricky. We can probably check if it exists in the environment and use it for performance, but that would require some refactoring to avoid cluttering the codebase. I'll keep that in mind as an perf optimization. We definitely can't fully switch to it because 1) it doesn't support encoding (and TextEncoder is kinda useless) and 2) the set of encodings it supports is much smaller than iconv-lite.

  4. Node streams -> Async iterators. This is interesting, I haven't seen it done that way. It will be easier to do with the concept of backends that I introduced in the next branch, so that Browser and Deno don't require streams at all. I'll keep that in mind too.

jimmywarting commented 2 years ago

You could use regular CommonJS in browser for quite some time using Browserify or similar libraries ... Deno also seems to support it. Maybe it's not as ergonomic as ES Modules

yea, but that requires compilers and you can't use a CDN, it needs lots of extra works for tiny projects, ESM is the feature whether or not you like it... it is the feature of how to load everything in Deno, browser and with node's now new HTTP loader

ES Modules can still be loaded from cjs if you need it to be. But the tradeof is that it needs to be loaded using async with import('iconv-lite').then(...) this can work from commonjs...

yosion-p commented 2 years ago

I read some documents, I think I understand the reason for using Buffer in iconv-lite,that Buffer instances can be converted to and from normal JavaScript strings. But Buffer seems to work only in NodeJS,and the Uint8Array's Browser compatibility is much better. Although they can be interchangeable, like this:

// From Buffer to ArrayBuffer:
var buf = Buffer.from("hello world!")
// var toUint = buf.buffer
// Slice (copy) its segment of the underlying ArrayBuffer
let ab = buf.buffer.slice(buf.byteOffset, buf.byteOffset + buf.byteLength);

// From ArrayBuffer to Buffer:
var buffer = Buffer.from( new Uint8Array(ab) );

But the UInt8Array is enough. I think the UInt8Array should be used like this: get the unicode code for each word, and then fill the Uin8Array in according to the UTF-8 encoding.... uh...🤣I created a demo,I'm not sure we can do that, Maybe you can take a look at it.

ashtuchkin commented 2 years ago

Node's Buffer is currently a subclass of UInt8Array, so we can use Buffer in all places that accept UInt8Array (liskov substitution principle). Even more, iconv-lite also does the reverse - it accepts UInt8Array in the same places where it accepts Buffer, e.g. in the decode() function. The problem is .encode() function. It currently returns Buffer, which has a richer API than just UInt8Array and any code that depends on this will break if we start returning UInt8Array instead. That is a breaking change of our core API, which would require a major version bump. That will also force us to stop supporting older Node versions where Buffer was not a subclass of UInt8Array (before v3.0 afaik).

On Fri, Sep 10, 2021 at 11:09 PM yosion-p @.***> wrote:

I read some documents, I think I understand the reason for using Buffer in iconv-lite,that Buffer instances can be converted to and from normal JavaScript strings. But Buffer seems to work only in NodeJS,and the Uint8Array's Browser compatibility is much better. Although they can be interchangeable, like this:

// From Buffer to ArrayBuffer: var buf = Buffer.from("hello world!") // var toUint = buf.buffer // Slice (copy) its segment of the underlying ArrayBuffer let ab = buf.buffer.slice(buf.byteOffset, buf.byteOffset + buf.byteLength);

// From ArrayBuffer to Buffer: var buffer = Buffer.from( new Uint8Array(ab) );

But the UInt8Array is enough. I think the UInt8Array should be used like this: get the unicode code for each word, and then fill the Uin8Array in according to the UTF-8 encoding.... uh...🤣I created a demo https://github.com/yosion-p/unint8array-demo,I'm not sure we can do that, Maybe you can take a look at it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/273#issuecomment-917327620, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHKL6UOCKKFNUCB3DRDUBLB7TANCNFSM5B3TCWNA .

yosion-p commented 2 years ago

before v3.0 afaik

Yes, According to official docs:

| v3.0  | [ The Buffers class now inherits from Uint8Array.](https://nodejs.org/docs/latest-v15.x/api/buffer.html#buffer_buffers_and_typedarrays) -- | --

So, I don't think we need to be compatible with anything prior to Node3.0 in the future. You know Encode () and decode() looks not much code, but the logic referenced inside is a bit complicated.

ashtuchkin commented 2 years ago

We are currently compatible with v0.10+, but if we change to using UInt8Array then we'll have to up it.

On Sat, Sep 11, 2021, 05:44 yosion-p @.***> wrote:

before v3.0 afaik

Yes, According to official docs: v3.0 The Buffers class now inherits from Uint8Array. https://nodejs.org/docs/latest-v15.x/api/buffer.html#buffer_buffers_and_typedarrays

So, are we no longer compatible with node3.0 pre-release?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/273#issuecomment-917377132, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHOBBANY6EHTZF3RQ2LUBMQGRANCNFSM5B3TCWNA .