WebAssembly / component-model

Repository for design and specification of the Component Model
Other
933 stars 79 forks source link

Why `create-2d` is an invalid identifier? #197

Open ghost opened 1 year ago

ghost commented 1 year ago

(Copy from https://github.com/bytecodealliance/wit-bindgen/issues/578)

I am declaring a function that creates a two-dimensional texture:

default interface client-experimental-texture {
    create-2d: func() -> string
}

This fails with:

  Caused by:
      invalid character in identifier '2'
           --> wit/client-experimental-texture.wit:2:5
            |
          2 |     create-2d: func() -> string
            |     ^', crates/wasm/build.rs:49:58
esoterra commented 1 year ago

From the component model's perspective, the question here is

should create-2d be a legal kebab-case identifier?

The grammar as defined in Binary.md is as follows and only allows numbers after the first letter of each word.

label               ::= w:<word>                                           => w
                      | l:<label> '-' w:<word>                             => l-w
word                ::= w:[0x61-0x7a] x*:[0x30-0x39,0x61-0x7a]*            => char(w)char(x)*
                      | W:[0x41-0x5a] X*:[0x30-0x39,0x41-0x5a]*            => char(W)char(X)*

In order to make identifiers like create-2d legal without allowing identifiers to begin with numbers, we would need to add a different production for the first word than for subsequent words.

tschneidereit commented 1 year ago

In order to make identifiers like create-2d legal without allowing identifiers to begin with numbers, we would need to add a different production for the first word than for subsequent words.

A reason not to change anything here is that it'd not be clear how to generate unambiguous bindings for many languages. For example, if the WIT contains both create-2d and create2d, what should the generated names look like in a language like JS or Java, which camel-cases identifiers? The most natural answer for both would be create2d (or Create2d if it's a class name), and I really can't see any good alternatives :/

lukewagner commented 1 year ago

I think it's worth considering relaxing the grammar of kebab-names to allow leading numbers in words after the first as Kyle suggested. This would also be useful if we wanted to better-support version numbers, as suggested in #134.

But Till has a good point. Separately, we've talked about requiring kebab names to be case-insensitively unique (so that, e.g., you can't have both create-XML and create-xml, as this would conflict in a casing scheme that mapped both to createXml). It might be a good idea to further tighten that requirement and require kebab names to be hyphen-insensitively unique (so that you can't have both ab and a-b), which may be independently valuable even outside of Till's example above).

tschneidereit commented 1 year ago

It might be a good idea to further tighten that requirement and require kebab names to be hyphen-insensitively unique

Ah, I like that as a solution! And if there turns out to be a reason to not go quite that far, then applying it just to hyphens before numbers would be a good approach, too