UTF-16 instead of UTF-8 in URL Descriptor to follow USB conventions (and tooling)

WICG / webusb

Connecting hardware to the web.

https://wicg.github.io/webusb/

Other

1.3k stars 127 forks source link

UTF-16 instead of UTF-8 in URL Descriptor to follow USB conventions (and tooling) #101

Closed riggs closed 6 years ago

riggs commented 6 years ago

As much as it pains me to suggest UTF-16 instead of UTF-8, most USB libraries have tooling for UTF-16 because that's how String descriptors are encoded. Libraries are unlikely to add UTF-8 support just for WebUSB.

Finally, converting between unicode formats is far less of a burden in Javascript than in C. (And 99.9+% of the time no conversion will be necessary because JS uses UCS-2 internally.)

reillyeon commented 6 years ago

The choice to use UTF-8 was made to reduce the amount of device program memory necessary to store URLs, which are typically ASCII strings and therefore encode particularly well in UTF-8.

larsgk commented 6 years ago

@riggs - could you maybe mention some of those libs/tools? - I'm working on a post on how to get WebUSB into existing hardware (tips'n'tricks) for the industry.

riggs commented 6 years ago

It honestly turned out to be far less of an issue than I thought it would be. I had forgotten C has utf8 literals (my background is python & JS), so the conversion and size calculations (sizeof(URL) - 1) are trivial.

I converted an existing LUFA-based codebase, and I have a PR open here with my approach & changes.