AssemblyScript / assemblyscript

A TypeScript-like language for WebAssembly.
https://www.assemblyscript.org
Apache License 2.0
16.93k stars 663 forks source link

Switch AssemblyScript to UTF-8 by default? #1653

Open dcodeIO opened 3 years ago

dcodeIO commented 3 years ago

Given the amount of foregoing heated discussions on the topic, especially in context of Interface Types and GC, I am not getting the impression that anything of relevance is going to change, and we should start planning for the worst case.

So I have been thinking about what would be the implications of switching AssemblyScript's string encoding to W/UTF-8 by default again, and that doesn't look too bad if all one really wants is to get rid of WTF-16, is willing to break certain string APIs, wants it to be efficient after the breakage and otherwise is not judging.

Implications:

Means we'd essentially jump the UTF-8 train to have

Note that the proposition of switching AS to UTF-8 is different from most of what has been discussed more recently, even though it has always been lingering in the background. Hasn't been a real topic so far due to the implied breakage with JS, but unlike the alternatives it can be made efficient when willing to break with JS. Certainly, the support-most-of-TypeScript folks may disagree as it picks a definite site.

If anything, however, we should switch the entire default and make a hard cut because

Thoughts?

Qix- commented 3 years ago

There are solutions to the .length problem. As I mentioned before, complete compatibility with WTF-16 from a UTF-8 perspective would require performance overhead. That's unavoidable, but for some it might be preferable.

Use compat types for String and the like, and offer libraries to use always-available String8 and String16 types provided by AssemblyScript.

Make String map to String16Compat or something so that it works correctly across all encodings, and so that under utf-8 mode .length takes a performance hit as it will have to calculate the string length on the fly. Or cache the length under the hood if you'd like.

I'm sure there are a handful of similar great ideas. But it is impossible to get both worlds at once. If that is a hard requirement, then AssemblyScript is doomed. A tradeoff is necessary.

My main criticism of this discussion at the meta level, however, is that it seems the committee is unwilling to make a tradeoff here where one is clearly necessary. There is no incredible, best solution here. All of the problems of interop between the two have been cleanly laid out. It's just politics at this point, about which approach is the least obtrusive.

Beyond what I've said I don't think I can contribute much else.

protheory8 commented 3 years ago

The upcoming Interface Types should bring another abstraction that will allow to interop data between wasm modules built in different languages without having to consider their specific circumstances, including the format of strings used in those languages.

I thought AssemblyScript was not going to implement Interface Types proposal?

farteryhr commented 1 year ago

it's THE problem all kinds of programming languages face.

my suggestions aiming "least war": (in other word, most function provided)

(if no checking is desired, replace all utf with wtf)