WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
982 stars 70 forks source link

validation rule of `ref.is_null`, `ref.as_non_null` and `ref.cast` #512

Closed zapashcanon closed 7 months ago

zapashcanon commented 7 months ago

Hi,

In the validation rule of ref.null ht I understand why ht has to be valid (ok in the rule), you don't want the user to write ref.null $idthatdoesnotexist.

But it is not clear to me why the same applies to ref.is_null. It has type ref null ht -> ref ht, but here, ht comes from a value that is on the stack and I don't see how it could not be valid (or why we should care when validating the instruction).

The same question applies to ref.as_non_null and to the rt' type in ref.cast (here I understand why rt must be ok but not rt').

I feel like I'm missing something obvious but I can't find an explanation.

Thanks.

rossberg commented 7 months ago

Keep in mind that this is a typing rule that can just make up ht, it is a locally non-deterministic "guess" with nothing else constraining it. (A stack exists at runtime but not in the validation rules – the instruction types are all there is.) In particular, an ill-formed choice would later produce a subtyping premise against an ill-formed type in the sequencing rule, e.g., when typing (ref.null any) (ref.is_null).

This is not uncommon in declarative type systems that need to "guess" types in individual rules.

zapashcanon commented 7 months ago

Yes, I'm not disturbed by the rule making up ht. But I don't see how an ill-formed choice is a problem as if it is wrong, it won't match when combining the rules.

The typing judgements are of the form:


------------- (0)
Γ ⊢ ε : ε → ε
Γ ⊢ i : t∗ → t′∗      Γ ⊢ i∗ : t′∗ → t′′∗
----------------------------------------- (1)
Γ ⊢ i i∗ : t∗ → t′′∗

Then, with your example we have:


-------------------------------
Γ ⊢ ref.null any : ε → ref null any

-----------------------------------
Γ ⊢ ref.is_null : ref null ht → i32

And we can apply (1) only if ht matches any:

Γ ⊢ ref.null any : ε → ref null any     Γ ⊢ ref.is_null : ref null any → i32
-----------------------------------------------------------------------------
Γ ⊢ (ref.null any) (ref.is_null) : ε → i32
rossberg commented 7 months ago

You still want to preserve well-formedness of all objects in all rules. Otherwise you'll have a very hard time to do any inductive proof about the type system.

In particular, we also have subsumption. Allowing ill-formed types to flow out of one typing rule can allow spurious derivations that use subtyping between ill-formed types. Since the definition of subtyping assumes well-formedness of its inputs, that would not be meaningful and is likely to lead to garbage.

zapashcanon commented 7 months ago

You still want to preserve well-formedness of all objects in all rules. Otherwise you'll have a very hard time to do any inductive proof about the type system.

I completely buy this.

What I am wondering is: is there a complete program that is rejected with this rule but accepted without. I tried to find one but was not able to, my intuition is that there is none but I'm not sure. If you can think of one, I'm really interested. I'm going to close this as this is not really an issue. Thanks for the explanation. :)

rossberg commented 7 months ago

Here is an example: the rule might cook up a (closed) heaptype with an unsound supertype declaration. Consider:

(type $t1 (func))
(type $t2 (array i32))
(func
  (return)
  (ref.as_non_null)
  (array.len)
  (drop)
)

The type definitions elaborate to the following defined types for validation:

dt1 = (rec sub ε (func ε → ε)).0 dt2 = (rec sub ε (array i32)).0

The return instruction can be given type ε → ⊥. Furthermore, without the check, the ref.as_non_null rule would allow typing that with a random ht = dt2' where

dt2' = (rec sub dt1 (array i32)).0 (Ouch!)

Since this would never be checked, it would successfully be recognised as a subtype of $t1. At the same time, (ref ht) is of course recognised as a supertype of ⊥, so that the program would validate, while it doesn't right now.

This uses the bottom type, so I can't off-hand construct an example where such an unsound cast could actually be executed. But I wouldn't bet it's impossible.

rossberg commented 7 months ago

Ah sorry, bogus example, I simplified too much on the way. Let me try again...

rossberg commented 7 months ago

Here you go:

(type $t1 (func))
(type $t2 (array i32))
(func
  (local (ref $t1))
  (return)
  (ref.as_non_null)
  (local.tee 0)
  (array.len)
  (drop)
)