ada-url / ada

WHATWG-compliant and fast URL parser written in modern C++, part of Node.js, Redpanda, Kong, Telegram and Cloudflare Workers.
https://ada-url.com
Apache License 2.0
1.33k stars 81 forks source link

Add support for non-UTF-8 inputs #77

Open lemire opened 1 year ago

lemire commented 1 year ago

The entire code base assumes UTF-8. To support UTF-16, we simply need to transcode (easy!).

anonrig commented 1 year ago

We should research the use cases for adding non-UTF-8 input support before advancing/working on this.

lemire commented 1 year ago

It is trivial to add a front-end transcoder to support any unicode encoding. But yeah... not much demand so far.

anonrig commented 1 year ago

I wonder if there is any usage from non-browser environments for this. @jasnell any demand from cloudflare workers and/or node.js regarding this?

cyyynthia commented 2 weeks ago

That thread is quite old but given the labels I feel like it wouldn't hurt if I give my two cents. I've been hacking together privately a toy-project-grade js engine and am looking at the ecosystem of relevant high performance libs.

I think in the context of js engines it would be interesting to have a fast path for UTF-16 (and eventually latin1) that doesn't "just" hide UTF-8 transcoding. Working with the native encoding of the engine means no need for conversion work and no need to allocate working copies in UTF-8 which sounds desirable to me.

As for providing frontends with a transcode logic, I generally am cautious about it since libraries tend to each use a different implementation and in the end causes binaries to hold multiple implementations of the same function (without domain-specific shortcuts or optimizations) which isn't ideal. So long there's good documentation about the fact they're just QoL shortcuts and there's a path that involves no transcoding I'd be fine with that.