Open dcodeIO opened 3 years ago
Could you describe your goals here in more detail? Are you looking to explore whether WASI will be compatible with your goals in the long term, are you building a program or tool and looking to optimize how something runs on the Web in the short term, both, or something else?
I am primarily interested in replacing the custom set of non-standard imports AssemblyScript has, like env.trace
, env.seed
, env.time
, etc. with WASI to integrate well with both WASI-enabled hosts and an interchangeable JS polyfill, ideally in a way that using WASI doesn't negatively affect code size or efficiency of such a transition. Should work fine with most APIs, but one unfortunate obstacle there is console.log
and friends, where using WASI Filesystem bloats both the module and the polyfill, which I'd like to avoid. For example, just logging a static string using WASI Filesystem will trigger inclusion of full GC support in a module currently, which is not always what one wants. So I figured that maybe WASI may be able to help by splitting say Logging out of Filesystem, which may be generally useful beyond my use case. Quite a long shot of course, but I am not overly familiar with what has already been discussed, if something like this would fall into WASI's design space, so I thought I ask :)
Logging in general seems straightforward to consider, and separating out functionality like this into modules is something we're already working on. I have concerns about WTF-16 string support though.
As I mentioned elsewhere, interface types currently look like the most likely answer to how to interchange strings in WebAssembly, so that's what we're preparing for in WASI.
Using UTF-8 for now aligns with interface-types' canonical representation, so it's the closest approximation to interface types that we can get for now. And, avoiding WTF encodings means that we won't need to worry about pieces of the ecosystem coming to depend on interchanging ill-formed data, causing compatibility problems when we start migrating to interface-types strings.
Would it work for your use case if we defined a logging API that only accepted UTF-8 strings for now? I recognize it'd have some overhead for your use case, but we'd plan to address that by migrating the API to interface types as soon they become available.
Concerning the GC requirement, for the case of passing a string literal to a logging function, would it be feasible for the compiler to recognize this case, and convert the literal into UTF-8 at compile time?
Alternatively, is there a way in AS to do an "unsafe delete"? A logging API could guarantee to not let the pointer you pass it escape, so you could create a string, pass it to the log API, and then "unsafe delete" it afterwards, so it wouldn't need a full GC.
Logging in general seems straightforward to consider, and separating out functionality like this into modules is something we're already working on.
π
Alternatively, is there a way in AS to do an "unsafe delete"?
I guess there is more I can do, yeah, like resorting to malloc and free essentially for intermediate UTF-8 garbage, but that'll still trigger inclusion of the dynamic memory manager, which is one large dependency of GC. Doesn't really matter much anymore once the MM is included, I think.
Would it work for your use case if we defined a logging API that only accepted UTF-8 strings for now?
Hmm, not sure. As far as I can tell, imposing UTF-8 on languages using a different native encoding is causing most of the problem.
What do you think of adding both a let's say logln
and a logln16
, with the latter scheduled for removal once IT "stream of char
" or similar becomes available, respectively double re-encoding on the API level is solved? In the browser polyfill, the 16 variant could then just forward to console.xy
for the time being. I'd of course agree that an API like that isn't exactly nice, but perhaps it can be justified to avoid double re-encoding in the meantime? (I guess heavier APIs, like FS, that typically don't have a WTF-16 endpoint, are fine with just UTF-8 for now)
Imagine being C# developer and not understanding that QWASI supports their string encoding type. π
For the purposes of a logging interface, is the that cost of reading a UTF8 string from an ArrayBuffer in JS really that different from reading a WTF16 string from an ArrayBuffer? Either way the JS string has to created on dynamically right, is the additional UTF8 translation to WTF16 while reading from the array really that slow? (honest question, I have not measured it).
Either way, logging interfaces should probably assume they could be writing to filesystem (which they likely be will in many cases) which in generally a slow operation, which I would have thought would dominate the UTF8 decode phase. No?
Regardless, discussions about encoding seems separable for the specific question around whether we should add a logging API.
I agree that this is separable, yeah. Regarding your questions, this isn't entirely about performance. I guess the best one could do in their non-UTF-8 language is something like the following:
const staticBuf = memory.data(256);
export namespace console {
export function log(msg: string): void {
let size = computeUTF8Len(msg);
if (size < 256) {
encodeUTF8(msg, staticBuf);
callWasi(staticBuf, size);
} else {
let dynBuf = heap.alloc(size);
encodeUTF8(msg, dynBuf);
callWasi(dynBuf, size);
heap.free(dynBuf);
}
}
}
which eliminates the need for dynamic allocation of strings considered small. So, if the string is small, one would get
which some may say is fine, while others may still be a bit unhappy, depends. Note that this already pulled in some code that is only necessary due to UTF-8 everywhere, and in general is not as efficient as it could be. The pain point, however, is not that, but that there is an else
, that may never execute, but still lead to the following:
A typical compiler may not be able to apply sophisticated optimization in an attempt to DCE the dynamic memory manager post-compilation, in turn leading to every single module doing a console.log("the bug is here")
shipping the heavy machinery. I agree that in the current state of affairs one could justify that, but I'd also understand if people would not be so happy about it.
Now, even if one would attempt to DCE the MM, there is still the looming problem of what will happen with a polyfill in the browser, which is:
Note that the latter will even be the case with the current state of Interface Types, but it has been mentioned that a "stream of char
" may be able to solve this eventually. Let's see.
As I said, it still amazes me, and I am not mad or something, just trying to raise awareness towards the implications of UTF-8 everywhere that may perhaps not be on everyone's radar yet π
P.S.S. I'd be happy with a logln16
for now, and then see how things develop, but a logging API is certainly useful even if my prayers for a temporary solution remain unanswered.
I don't think anybody here is suggesting you are mad.
Regarding the first part of your example (that part about the cost of including malloc) wouldn't it make more sense to always allocate such strings on the stack using alloca
(or whatever language equivalent exists)? Does AS not have a stack in linear memory?
Oh, sorry, didn't want to imply that someone suggested that. It's all fine, appreciate your input π
And yeah, AS does not have a C-like stack (well, technically it has some sort of managed shadow stack now for incremental GC, but can't use it for this, it's all pointers). Instead, it exclusively relies on the Wasm execution stack in an attempt to avoid unnecessary stacks, but the Wasm execution stack is a bit limited and cannot be used as well.
Would it make sense to add a region of heap like llvm does for stack data? The convention that llvm uses is a wasm global called __stack_pointer
which grows down. I know it doesn't solve this entire problem but it does solve the first part. I agree including malloc for this kind of things seems excessive.
(Doing so would also avoid stuff like const staticBuf = memory.data(256);
which waste memory and won't play nice with threads.)
AS uses __data_end
, __stack_pointer
and __heap_base
, with a stack growing downwards, similar to LLVM, for the managed shadow stack, yeah. All a bit unfortunate, as the GC is precise and relies on all the data within the stack to be zeroes or pointers. One could technically implement just another stack, in a separate region, using memory.data
, which is the AS equivalent of a static
array, obtaining and blocking a slice of static memory, i.e. what then becomes a (data ...)
segment. Possible I guess, but wondering if that could also be considered excessive for just quickly calling a WASI API. And still leaves us with double re-encoding, hmm.
Many of the comments here seem to be talking about not just about logging, but about WASI APIs in general.
To be clear about one thing: strings are not WASI's problem. They're WebAssembly's problem. And what's more, WebAssembly is already working on a solution. If anyone doesn't like it, WASI isn't the place to change it.
I don't think logln16
sounds like something WASI should do at this time, in part because of the risk of "yes is forever", and in part because of the risk of this spreading beyond logging. If we're going to have a new string convention across WASI, we should really have a plan for how we want it to work. And it turns out, not only is this already on everyone's radar, there's already a plan underway.
Please keep this issue focused on logging, and please be open to suggestions specific to logging APIs.
If we want to take inspiration for existing APIs, it might be worth looking at what linux chose to do: https://man7.org/linux/man-pages/man3/syslog.3.html.
We might also want to consider whether we are designing a system for debugging (which is what the web's console.log/error is generally for) to event logging for things like servers and deamons which tend to have a little more structure and used in production builds. If its the former we might want to include the word "debug" somewhere in the name.
The syslog interface looks good to me. Can map well to JS's debug
, info
, warn
and error
I think. Regarding simple debugging vs logging in production, perhaps both can use the same API, and we can add a flag along the lines of LOG_CONS
(which is actually something else) to indicate to print to console? Alternatively, there may be separate log levels, not sure what's better.
Is there a fundamental difference between logging for debugging and event logging in production, besides the log level and the consumers of the log messages? I agree that initially these seem different, but I haven't yet been able to think of a way that they're different from an application perspective.
If not, I think it makes sense to focus on figuring out what levels to have, and keep the API simple and general.
I have a use case where I'd love to get rid of a custom ABI in order to switch to WASI for portability purposes, but writing to file descriptors in UTF-8 encoding exclusively doesn't map very well to my use case. So I was wondering if WASI could spec out a logging module, independently of whether the console is a terminal, a browser console or in the future perhaps sends log data over a network if someone wants to.
I am asking because console usage is an unfortunate pain point in my use case currently (bundling encoders, frequent re-encoding into dynamic allocations, potentially GCed, double re-encoding on the web and such), while everything else (like abort, random, time, etc.) would map quite well to WASI already. Just having something that isn't UTF-8 FDs would help a ton to switch to WASI while having a good feeling about it.
I'd naively imagine something like:
I am aware that "logging" can be much more complex than what I outlined here of course. Perhaps "console" would be a better name, but "logging" could become more general. Also, Interface Types may eventually help here to reduce the number of arguments.
What do you think? Is this something worth exploring? (In general I'd probably have not much to complain for a while if only logging was a bit more Web-friendly. π)