Open lemire opened 1 year ago
We should research the use cases for adding non-UTF-8 input support before advancing/working on this.
It is trivial to add a front-end transcoder to support any unicode encoding. But yeah... not much demand so far.
I wonder if there is any usage from non-browser environments for this. @jasnell any demand from cloudflare workers and/or node.js regarding this?
That thread is quite old but given the labels I feel like it wouldn't hurt if I give my two cents. I've been hacking together privately a toy-project-grade js engine and am looking at the ecosystem of relevant high performance libs.
I think in the context of js engines it would be interesting to have a fast path for UTF-16 (and eventually latin1) that doesn't "just" hide UTF-8 transcoding. Working with the native encoding of the engine means no need for conversion work and no need to allocate working copies in UTF-8 which sounds desirable to me.
As for providing frontends with a transcode logic, I generally am cautious about it since libraries tend to each use a different implementation and in the end causes binaries to hold multiple implementations of the same function (without domain-specific shortcuts or optimizations) which isn't ideal. So long there's good documentation about the fact they're just QoL shortcuts and there's a path that involves no transcoding I'd be fine with that.
The entire code base assumes UTF-8. To support UTF-16, we simply need to transcode (easy!).