WebAssembly / stringref

Other
37 stars 2 forks source link

Consider specifying WTF-8 variant when creating WTF-8 string views #38

Open wingo opened 2 years ago

wingo commented 2 years ago

Currently you can make a WTF-8 view on a string with string.as_wtf8 and read string contents by stringview_wtf8.encode $wtf8_policy, or indeed stringview_wtf8.slice (which doesn't take a policy). The intention is that you can process the WTF-8 contents of a string in a streaming way with a fixed-size buffer. However might it make sense to instead pass the policy argument to string.as_wtf8 ? Or in the spirit of #35, perhaps the names would be string.as_utf8, string.as_wtf8, string.as_lossy_utf8, all resulting in the stringview_wtf8 type.

I think the essential thing this allows you is to move when any trap/assertion might take place, for the strict UTF-8 variant, to the point where you create the view. An encode would never trap unless the memory is out of range.

For an implementation that doesn't use WTF-8 internally and which eagerly transcodes (substrings of) to WTF-8 when creating a stringview_wtf8, having the policy up-front would allow the policy to be applied when the view is created, and stringview_wtf8.encode becomes a simple memcpy. But, this might not be important. I don't know how viable this "MVP" kind of implementation will be in the long term -- perhaps breadcrumbs will be a comprehensively better solution.