Open lukewagner opened 1 day ago
Hi @lukewagner, thanks for filing this issue!
Agreed that input is a nuanced and detailed domain, and its precisely for this reason that we decided to model it similar to the web, but we tried to leave out anything that's a bit more niche and not needed by most apps. That is, we included pointer-events as it abstracts over mouse, touch, pen, as well as keyboard. But left everything else out, as pointer and keyboard should cover the vast majority of apps.
That's not to say the the other items aren't important, they absolutely are. But building a full windowing system is an incredibly complex task, that requires a deep understanding of every operating system out there, as well as how they each relate to security/sandboxing/finger-printing. So we opted for a basic surface, modeled over something that already has multiple implementations on multiple operating systems (the web), but we were very careful to make sure there's room for a newer, more complete, full windowing proposal. If I had to guess, I'd say that there's literally years of work to designing such a system from scratch.
We'd love to see people with domain expertise getting more involved! Do you have anyone in mind?
Re breaking out the proposal: I don't have a strong opinion on that, either way works for me. We did have a couple of conversations about this with people in the community before we moved to phase 2. Ultimately, we went with having separate packages in one proposal, but I'm open to changing that. Just don't wanna be changing proposal structure every few months, so maybe let's first make sure that there's strong consensus on the matter.
Thoughts?
Right now I think we are best served by collecting stakeholders and experts to collaborate on the various packages here all as part of one proposal, until we see that its no longer serving the process.
To me there are two reasons to break out a package from this proposal into its own proposal:
If someone can make a compelling argument that one or both of those have created a problem for wasi-gfx or wasi-surface, we can split packages off this proposal. I don't want to do so, however, on the presumption that one or both of those may become a problem in the future.
Scanning wasi:surface
, I didn't realize it was based on Pointer Events (viz. because there was no mention of Pointer Events and "surface" is not a concept in the Pointer Events spec), and so I assumed it was an ad hoc interface, which would require a lot more evaluation than an interface that is 1:1 with a standard Web API. If wasi:surface
is indeed based on Pointer Events, then it would be good to have the correspondence laid out a bit more clearly in names and doc-comments, as we've done with wasi:http
and (I assume) wasi:webgpu
. E.g., wasi:surface
doesn't seem like the right name.
More generally, we didn't add a key-value-store or config-store to wasi:http
(even though many proxies want such functionality and proxy-wasm indeed bundled such functionality); instead we have separate wasi:keyvalue
and wasi:config
packages/proposals that describe these modular sets of functionality so they can be used modularly. Putting a subset of input events into a monolithic graphics proposal seems similarly conflating two distinct domains which are naturally going to evolve separately over time. Similarly, I don't think we should vote and publish them as one monolithic unit; we've been doing that on a WIT package level which then maps naturally to package versioning.
Thus far we've had one WIT package per GH repo, which suggests separate repos. I suppose that's not a hard rule, but I wonder if it messes with any of our automated processes for versioning/publication?
Also related to input:
While I've heard a number of obvious use cases for running wasi:webgpu
outside the browser in a headless mode where there is no user input, I'd be interested to see the list of use cases which want wasi:webgpu
and user input outside the browser, since it seems like there is a spectrum of possible use cases which suggest different interface designs. In particular, is the component trying to act like a whole GUI "app" or is it more like the logic associated with a single "widget" (or "control", in OLE parlance) or something else? If a single component instance can be associated with multiple surfaces (which the current WIT seems to suggest, unless I'm reading it wrong), then it's sortof neither, and so I'd be curious what the intended mental model is and what's the relationship between the component and the containing GUI non-browser host.
@lukewagner yeah, the wit isn't documented well, and I didn't do a great job explaining it above. My apologies for that. In fact, stabilizing and documenting the wit is likely going to be one of my main areas of focus in the next few weeks.
Just to try and clear up what wasi:surface
tries to be:
In the early days of this proposal, we briefly thought of defining a full windowing interface, someone even put up a PR for that, but we quickly realized that designing such an interface will require an amount of expertise and bandwidth that our small group simply didn't possess. However, we still believed that we can build something useful for now, and make sure we keep the door open for broader proposals later.
So here's what we're trying to do now: we're trying to define the smallest useful surface/canvas/window/whatever-you-call-it that is still useful. We considered having just a literal surface that gets no user input events, but that didn't seem very useful. So we decided to steal pointer and keyboard events from the web. We didn't copy them as closely as we copied webgpu, but the semantics are - or should be - mostly the same.
With a surface, keyboard events, and pointer events, we believe that we cover most general apps. And by taking the semantics from the web, we avoid having to do our own bikeshedding.
So in short: wasi:surface
is a surface api, with the addition of pointer and keyboard events.
As to what kind of apps we're trying to cover: we wanna cover both standalone apps, as well as widgets in existing apps. For now, we believe wasi:surface
can cover both. We're actually hoping to start doing some experiments testing surface as a plugin/widget soon.
Might also be worth mentioning, that if a host app wants to provide wasi:webgpu
to it's plugins, but wasi:surface
is too basic for their usecase, they could easily provide a custom windowing system, and have that plug in with wasi:webgpu
.
Does that clear it up at all?
Having
wasi:webgpu
correspond directly to the existing WebGPU Web API makes a ton of sense. But the functionality inwasi:surface
which includes the keyboard and mouse input is wayyy outside the scope of WebGPU. There are Web APIs for describing input sources, of course, and indeedwasi:surface/surface.key
references https://w3c.github.io/uievents-code/#code-value-tables, but since input is defined by wholly separate APIs that are useful and usable independently of WebGPU, it seems like input functionality should be a separate proposal that is published, versioned and advanced separately fromwasi:webgpu
.Incidentally, there have been a number of iterations on input on the Web Platform (indicating that it's a subtle and nuanced domain); see Pointer Events, Touch Events, Pointer Lock and the Keyboard API. Back when I worked on the Mozilla Games program, talking to gamedevs, some of these subtle API details mattered a lot, so I think it'd be a good idea to involve an expert in this area to help us scope out a proper "
wasi:input
" package and the set of Web APIs we should mirror into it. Or maybe there are other popular input APIs we should mirror that make more sense outside a browser? Inventing our own thing from scratch seems risky unless we have some significant input API expertise of our own. In any case, there will be some non-trivial design work, so it definitely seems appropriate to decouple fromwasi:webgpu
.