WebAssembly / component-model

Repository for design and specification of the Component Model
Other
898 stars 75 forks source link

Reserve handle 0 #282

Closed badeend closed 7 months ago

badeend commented 7 months ago

to be an always-invalid "NULL" handle?

Two benefits that come to mind for client languages:

lukewagner commented 7 months ago

That's a really great idea. I feel like we even discussed this at some point in the past and then I forgot to include this in CanonicalABI.md. In particular, in the Preview 3 timeframe, I'm expecting there to be some canonical built-ins that want to optionally return a handle which would benefit from being able to simply return an i32.

-1 is another option for a sentinel value worth considering. I just noticed that CanonicalABI.md is missing an integer overflow trap btw... but assuming I add that, we could just decrease the max. 0 seems like the better sentinel value for catching errors, as you pointed out. However, 0 means either burning an extra element per resource table or adding an extra subtraction on the access path. Making 0 a sentinel would technically be a breaking change (since handle indices are currently deterministically specified), but it seems like code "shouldn't" depend on the precise indices so maybe it's not in practice.

@alexcrichton (and anyone else) WDYT?

pchickey commented 7 months ago

Until very recently I made the wasmtime-wasi implementation refuse to use indices 0,1,2 for pseudo-resources, and then resources, in order to catch programs that were accidentally assuming stdio lived there. This did catch some bugs in the early days of pseudo-resources, but I'm not sure if it would since we introduced proper resources in Wasmtime. We got rid of that special case recently on the grounds that we probably shook out all of those bugs. I would be fine with this change.

badeend commented 7 months ago

-1 is another option for a sentinel value worth considering. I just noticed that CanonicalABI.md is missing an integer overflow trap btw... but assuming I add that, we could just decrease the max. 0 seems like the better sentinel value for catching errors, as you pointed out.

0 would definitely be my preferences because of the pervasiveness of 0 initialization/coercion in the programming language design universe. (In)famous example being JavaScript: Number(null) === 0.

However, 0 means either burning an extra element per resource table (...)

As an example; C# uses 0-initialization for all fields. Removing 0 as a valid handle value would enable handle types to be safely emitted as structs instead of classes, removing the need to perform a heap allocation per handle. So in this specific case it would indeed add a one-time per-table cost on the host side, but could remove a per-resource cost on the client side.

Making 0 a sentinel would technically be a breaking change (since handle indices are currently deterministically specified), but it seems like code "shouldn't" depend on the precise indices so maybe it's not in practice.

If it helps, I could submit PRs to existing component-model implementations (wasmtime, JCO, others?) to already start reserving the 0th index. 1) to minimize the practical impact of a potential transition to preview3, and 2) to gather real world experience on how much overhead we're actually talking about here.

alexcrichton commented 7 months ago

I think this would be a great change to make! I also would agree we can probably do this today without actually breaking anything in practice except for some tests in Wasmtime. I also think it'd be reasonable to reserve 0 and -1 at the same time too.

lukewagner commented 7 months ago

Thanks for the feedback all. Good idea about reserving both 0 and -1. I even wonder if, given that 1 billion handles seems like Enough For Anyone (and if not, 4 billion is probably not enough either and so what you really want is to use an i64 opted-into via new canonopt) it would be a good idea to set the max to 2**30 - 1 so that the high two bits are free (e.g., for use in conjunction with Smis). I'm happy to propose that change.

sunfishcode commented 7 months ago

Would it make sense to also reserve 1 and 2 too, to catch misuses of stdout/stderr? Or perhaps even more low values, to help catch conflations of libc fds in general with canonical abi handle indices?

alexcrichton commented 7 months ago

I think it's reasonable to chop off the upper two bits yeah. I'm not sure where to draw the line on the lower bound though. Zero seems like a good idea to rule out since lots of things like to zero-initialize things into a "null" state, but anything else to me feels like it's almost a bit too arbitrary to bake into the canonical ABI. For example folks are already going to have to deal with other surprises such as there are two handles with index N - just different types. While we could play whack-a-mole with trying to cover up gotchas (like 1/2 in terms of stdio/stderr) I'm not sure if it's worth it per se.

I don't really feel strongly either way though.

badeend commented 7 months ago

As long as we're happily bikeshedding, let me pile on another one:

I'm not sure where to draw the line on the lower bound though

How about 2147483648 😛 ? Ie. start numbering handles from s32::MIN, incrementing up to (but not including) 0.

It requires a bit-masking operation on the data access path, though.

lukewagner commented 7 months ago

I (non-strongly) feel the same as Alex, although definitely an interesting idea. While 0 as "null" is rather universal, 1 and 2 as stdout/stderr is rather specific to just file descriptors. Using negative numbers is also an interesting idea, but I worry it creates more trouble than it's worth (bitwise arithmetic, harder to read when debugging, source of bugs if guest code wants to covert to a dense array index, ...).

badeend commented 7 months ago

I feel there is a consensus on reserving at least 0, so why not start with that and see what happens from there.

lukewagner commented 7 months ago

Ok, PR up in #284.