Allow fragments in a kebab name to start with a number

esoterra commented 2 months ago

A kebab name in the current spec is made up of one or more "fragments" separated by a hyphen (-). These fragments can't begin with a number.

However, there are examples in WebIDL that will need to be translated to WIT where either a name part or the whole name starts with a number.

GPUExtent3D -> gpu-extent-3d
GPUSize32 -> gpu-size-32
GPUTextureDimension cases "1d", "2d", and "3d" -> 1d, 2d, 3d

The main concern to my knowledge with letting fragments/identifiers start with numbers is that these are translated into programming language identifiers which typically don't allow identifiers to start with numbers. However, many languages have techniques available in their grammar to handle these cases when translating names (e.g. mapping 1d to _1d in Rust).

WIT authors on the other hand have relatively few options for handling these. They could either manually use domain/convention knowledge (e.g. 1d as d1), which isn't possible using automated tools like in-progress WebIDL -> WIT efforts, or add arbitrary prefixes to the front of their interface (e.g. c1d for case 1d).

I think that making bindings generators find language-specific ways to handle making identifiers valid will be better than forcing WIT authors to write names that don't begin with numbers, so we should consider letting all fragments in names start with numbers.

yoshuawuyts commented 2 months ago

Some context here: Robin and I were chatting yesterday about some of the challenges of mechanically projecting various IDLs and programming languages to and from WIT. The GPUTextureDimension example was one that @mendyberger brought up in a separate conversation as a challenge in projecting the WebGPU WebIDL definition to WIT. Though this was first found in the WebGPU projections, it is by no means unique to WebGPU or even WebIDL. This seems like something worth solving in the general case.

MendyBerger commented 2 months ago

There are lot's of such cases that I ran into while trying to convert WebIDL to wit, hope to write it down somewhere in the next few weeks.

FWIW I'm skeptical that we can find solutions to all of these problems that would work nicely in all - or even most - languages. IIUC wit is (and should be) a subset of the capabilities of most languages. If we try to introduce behavior that languages don't naturally support, we might end up with API's that feel foreign and unintuitive.

yoshuawuyts commented 2 months ago

There are lot's of such cases that I ran into while trying to convert WebIDL to wit, hope to write it down somewhere in the next few weeks.

@MendyBerger Yes, that would be great - thank you!

FWIW I'm skeptical that we can find solutions to all of these problems that would work nicely in all - or even most - languages.

@MendyBerger I mean; yes of course you're right. Creating great language-native bindings in the general case will always require additional work by library authors. Even simple abstractions like the WASI Descriptor API, when projected into Rust require additional massaging to make feel native to the language.

In my perspective, the goal of WIT, Wasm Components, and codegen tools is to automate as much of the process away as possible. So that even without manual projections the APIs are still usable. But with some additional work, they can be made to feel truly native. In the case of Descriptor that would mean wrapping it in an API which resembles Rust's File type.

But while we may not be able to automatically generate perfect language-native bindings for each API every time; it's issues like these which enable us to improve on the status quo. So I'm happy you've been able to find some of those limitations already; and I look forward to you filing the issues so we can start thinking about how we can do better.

MendyBerger commented 2 months ago

Specifically about wrappers: I'm a bit concerned about requiring wrappers for components, I think we should aim to make them usable as is.

Here are some reasons:

Allow newer languages to work out of the box.
Allow SDKs that work nicely across languages.
Allow an era of language-agnostic libraries.

If we require wrappers for all components we'll lose all these advantages.

I'd much rather spend the time now to build components that are usable as-is, than do it the quick way now and end up with components that feel weird to use without wrappers.

Regardless, your and Robin's point still stands, if we can find solutions to specific problems that work well we should try to do it.

(BTW I'll be mostly out until May 1st, so I might be slow to respond)

tschneidereit commented 2 months ago

I strongly agree with Mendy's take here: we've always tried to make bindings for components be "good enough" with a pretty high bar, to the point where an API should become usable in all languages without any extra work, and everything a wrapper would add is nice-to-have.

There are certain APIs where that is probably never going to really work, but I don't think those are arguments for dropping the approach. As an example, wasi-http is a hard case, because it needs to be expressive enough to be a good basis for implementing existing abstractions on top of it, such as JS's fetch API. That means it necessarily has a degree of complexity that makes it hard to use without a wrapper.

Most APIs don't have that kind of constraint though, and I think we should optimize WIT as a language and wit-bindgen and other toolchains for those APIs, not the hard cases.

yoshuawuyts commented 2 months ago

Actually, let's back up here a little. I feel like we're about to get side tracked into a discussion about how good our bindings currently are, and whether we can hypothetically generate perfect bindings for all languages. I don't think this is the right venue for that. Instead, let me state what I believe we all agree on, in the hopes of moving this conversation forward:

We would like for our codegen tools to be able to automatically be able to generate the best bindings possible (or "good enough with a pretty high bar" the way Till put it).
We specifically believe that prefixing enum variants with numbers would allow us to generate better bindings, improving on the status quo (the concrete subject of this issue).
We believe it is valuable to document and report more cases where we notice a lack of expressivity in WIT, so that we can improve the quality of our WIT documents and generated bindings.

Does anyone here disagree with this?

tschneidereit commented 2 months ago

I agree with all that, yes—including the venue thing. Thank you for that! ❤️

MendyBerger commented 2 months ago

@yoshuawuyts I definitely agree with 1 and 3. For 2, how would prefixing enum variants with numbers, allow us to generate better bindings? My concern is that it will force us to generate worse bindings, not better ones.

lukewagner commented 2 months ago

FWIW, if we only allow the non-first-word/acronym to start with a number, I think it shouldn't pose any new challenges to bindings generators; it's only when the first letter of the whole label is a number that (iiuc) these questions are raised. Thus, at a minimum, I think we should consider that. As for whether it makes sense to let the first word/acronym start with a number, I'm more ambivalent.

WebAssembly / component-model

Allow fragments in a kebab name to start with a number #345