endojs / endo

Endo is a distributed secure JavaScript sandbox, based on SES
Apache License 2.0
805 stars 71 forks source link

Implement ocapn ByteArray as a passStyleOf recognized immutable byte array. #1331

Open kriskowal opened 1 year ago

kriskowal commented 1 year ago

Our current best solution for carrying bytes is a Uint8Array, which we use pervasively for hardened streams. This is one of the reasons that we treat Typed Arrays like Map for the purposes of hardening. The content is mutable but the API surface is tamper-proof. However, the content is readable and writable by both the sender and receiver over time, so like a Map, TypedArrays are not passable. That makes them a poor foundation for remote streams in an ocap discipline.

I propose that we need an analogous solution to the CopyMap and MapStore for transmission of immutable byte strings, tentatively CopyBytes and BytesStore, which would be passable and provide a firmer foundation for streams (async iterators of byte strings). CopyBytes should mesh well with TypedArrays, particularly Uint8Array and ArrayBuffer. To that end, it may be necessary for them to be Proxies, such that uint8Array.set(copyBytes) is sensible. Constructing an immutable copy of the content of an array-like should be possible, like copyBytes(uint8Array) or makeCopyBytes([0, 1, 2, …, 127]). CopyBytes would be backed by a TypedArray but would attenuate mutability. Slicing CopyBytes should be trivial. As with glyph strings, ropes are a possible optimization in the fullness of time.

As a critical optimization, we often use ring buffers for byte streams. We could conceivably also support another type for revocable copy bytes, that would permit a stream to reclaim the underlying byte range when the consumer signals it’s ready for another subrange.

kriskowal commented 1 year ago

attn @erights @gibson042 @mhofman

mhofman commented 1 year ago

I'm extremely concerned by the performance overhead of such a user-land abstraction. We should pursue a standard approach which allows engines to avoid array buffer copies, and minimal overhead on read access.

One such proposal at stage 1 is @Jack-Works's Limited ArrayBuffer

kriskowal commented 1 year ago

Shimming a standards track solution would be good.

Ah, memories. (My abandoned hobbies include a decade old fantasy of JavaScript having byte strings.)

erights commented 1 year ago

Attn @phoddie

phoddie commented 1 year ago

If you'd like an efficient read-only ArrayBuffer, that code is already in xst as part of our Secure ECMAScript / Hardened JavaScript work . This fragment...

const bytes = Uint8Array.of(1, 2, 3);
petrify(bytes.buffer);
bytes[2] = 1;

...throws TypeError: ?: read-only buffer.

Yes, a fully standard solution would be welcome.

kriskowal commented 1 year ago

A petrified buffer would certainly be eligible for passing over a CapTP.

Shimming petrify I think would require us to replace the TypedArray constructors and presenting all typed arrays as proxies, such that we could emulate petrification of a TypedArray with an ArrayBuffer shared by other TypedArrays.

phoddie commented 1 year ago

This may be what you mean, but... I wouldn't suggest shimming petrify as it is extremely broad & general. Instead, have a focused way to make an ArrayBuffer read-only. The implementation of that when running on XS will petrify instead of Proxy and such and so avoid oodles of runtime overhead.

erights commented 1 year ago

To shim making an ArrayBuffer itself read-only, we'd need a way to actually make an ArrayBuffer read-only in place. The only thing we know how to shim would be something that returns what seems to be a new read-only ArrayBuffer.

Same for a TypedArray.

kriskowal commented 1 year ago

Aye, I lay it out like this because oodles of runtime overhead might not be acceptable, even in an intermediate step toward a standard. But if petrify had standards ambitions, we could shim the intersection of petrify and TypedArrays, throwing for anything we can’t petrify. Or we could anticipate and shim a world where TypedArray can be frozen. Either of these strategies would require us to replace the TypedArray and ArrayBuffer constructors such that they return proxies and deny all programs thereafter direct access to TypedArray and ArrayBuffer instances, such that freezing one façade of an ArrayBuffer would freeze all the façades of the same ArrayBuffer.

The web platform’s solution to this problem is opaque Blob objects. If you want to see the contents, you have to copy them into a TypedArray. We could certainly harden and make blobs passable too without doing anything to TypedArrays.

phoddie commented 1 year ago

To shim making an ArrayBuffer itself read-only, we'd need a way to actually make an ArrayBuffer read-only in place. The only thing we know how to shim would be something that returns what seems to be a new read-only ArrayBuffer.

Y'all are the experts in patching things into a web-ish runtime. My goal is to avoid an ugly optimization problem around Proxy in XS when it already supports read-only buffers efficiently. If it is a problem that petrify does too much here, the functionality can easily be repackaged in a call that is focused on the needs of this particular problem.

Maybe side-stepping the problem with Blob is pragmatic, though there's more data copying. Perhaps implementing Blob with a petrified ArrayBuffer would allow Blob.prototype.arrayBuffer() to be almost free?

erights commented 1 year ago

replace the TypedArray and ArrayBuffer constructors

Doh! I'm embarrassed that I didn't think of this. Thanks!

Ok, I just read about Blobs at https://developer.mozilla.org/en-US/docs/Web/API/Blob . Aside from their non-JS stream() method, which we can omit from our whitelist, they seem like perfectly fine forms of CopyBytes. When we do add CopyBytes to our system, we could call it CopyBlob instead, and have passStyleOf certify that a Blob is a CopyBytes when it is hardened and has no surprising properties.

This would take the pressure off of needing to shim making an ArrayBuffer or TypedArray read-only in place.

erights commented 1 year ago

Maybe side-stepping the problem with Blob is pragmatic, though there's more data copying. Perhaps implementing Blob with a petrified ArrayBuffer would allow Blob.prototype.arrayBuffer() to be almost free?

I like that!

kriskowal commented 1 year ago

Maybe side-stepping the problem with Blob is pragmatic, though there's more data copying. Perhaps implementing Blob with a petrified ArrayBuffer would allow Blob.prototype.arrayBuffer() to be almost free?

I think the expectation of Blob.prototype.arrayBuffer() is that it would produce a mutable array buffer. It could be nearly-free if ArrayBuffers have a copy-on-write optimization.

phoddie commented 1 year ago

Yea, I had the same thought walking back from coffee just now. Alas.

mhofman commented 1 year ago

We could add a non standard option bag to arrayBuffer({readonly: true}), but that's getting kinda verbose for what would likely be the default case in a Hardened JS environment. I honestly doubt much code would break if the default was a readonly buffer, but still.

I actually think copy on write array buffers would be a good general solution on top of being a nice optimization. First it would likely apply to non Blob use cases. For example doing an arrayBuffer.slice() could return a copy on write buffer (making the original arrayBuffer copy on write as well obviously). At that point we could make construction of a Blob similarly copy free with a copy on write guard on the original buffer, aka a Blob could be a user-land concept that simply does the following (simplified to single array buffer input):

class Blob {
  constructor([inputBuffer]) {
    this.#buffer = inputBuffer.slice(0);
  }

  async arrayBuffer() {
    return this.#buffer.slice(0);
  }
}
mhofman commented 1 year ago

I mentioned Copy-on-Write in the context of transfer() in the November Plenary. I remain convinced that CoW for AB would solve a lot of performance issues, at the expense I understand of implementation complexity.

erights commented 1 year ago

See https://github.com/endojs/endo/pull/1538

dckc commented 1 year ago

Interesting bit of related work: Hex template literal | DeviceScript

example:

const buf: Buffer = hex`00 ab 12 2f 00`

h/t @turadg

phoddie commented 1 year ago

FWIW - the Moddable SDK has long done something like that for UUIDs to simplify BLE code. This generates an ArrayBuffer with the bytes of the UUID:

const CHARACTERISTIC_UUID = uuid`9CF53570-DDD9-47F3-BA63-09ACEFC60415`;
dckc commented 10 months ago

motivation: I was trying to do an isolated signer as an endo app, but I hit:

Error#2: Cannot pass mutable typed arrays like Uint8Array(33) [...]

So I had to put the signer and the network client in the same vat. :-/

hdWallet.js 0130695 https://github.com/dckc/finquick/pull/64

kriskowal commented 3 months ago

My latest batch of thoughts on this problem:

We need a solution that is a good balance between:

We might have to take a page from the promise.then distinguished method trick, but we can use a symbol-named method. This would allow eval twins to assimilate CopyBytes and still keep a private WeakMap of known-to-be-unmodifiable ArrayBuffers. Consider this sketch:

const copyBytesSymbol = Symbol.for('copyBytesInto');
const fixedBytes = new WeakMap();

const copyBytesPrototype = {
  [copyBytesSymbol](target) {
    target.set(fixedBytes.get(this), 0);
  }
};

const makeCopyBytes = array => {
  let copyBytes = fixedBytes.get(array);
  if (copyBytes !== undefined) {
    return copyBytes;
  }
  copyBytes = Object.create(copyBytesPrototype);
  fixedBytes.set(copyBytes, fromByteArray(array)); // XXX
  return harden(copyBytes);
};

const copyBytesInto = (target, source) => {
  const internal = fixedBytes.get(source);
  if (internal !== undefined) {
    target.set(source, 0);
  } else if (source[copyBytesSymbol] !== undefined) {
    assert.typeof(source[copyBytesSymbol], 'function');
    source[copyBytesSymbol](target)
  }
};
erights commented 3 months ago

What "promise.then distinguished method trick"?

kriskowal commented 3 months ago

What "promise.then distinguished method trick"?

Using a distinguished method like "then" to recognize an object that an eval-twin can assimilate into its own domain by invocation of that method. One could call it the “Thenable Pattern”.

erights commented 3 months ago

The eval twin problem explained at https://github.com/endojs/endo/issues/1583 seems fatal to having passStyleOf certify as passable any object with non-builtin methods. That's why all our passable types are operated on by functions rather than having their own methods. (Attn @mhofman you know why ;) )

kriskowal commented 3 months ago

So Blob is an eligible representation because all its methods are builtins on Node.js and the web platform, and we can Moddable to implement Blob on XS?

const blobArrayBuffer = uncurryThis(Blob.prototype.arrayBuffer);
const arrayBuffer = await blobArrayBuffer(new Blob(['hello, world']))
phoddie commented 3 months ago

We've avoided implementing Blob for a very long time. I wouldn't be sad for that to extend indefinitely. At least you aren't asking about Buffer.

That said, iIt should mostly be possible, with the exception of stream(). The API surface area isn't that big. We'd need to be thorough in testing to ensure the API is being faithfully emulated.

erights commented 3 months ago

Ok, I tried learning about Blob by writing a minimal toy shim and adding it to https://github.com/endojs/endo/pull/1538 at src/ses/blob-shim.js . Compare with the pass-style/src/byteArray.js in the same PR. Both encapsulate an actual ArrayBuffer to hold their data. I don’t love either. But some problems I see with Blob:

The ByteArray class at pass-style/src/byteArray.js is effectively an immutable subclass of ArrayBuffer . The ByteArray.slice instance method returns an actual ArrayBuffer, thereby getting the contents out synchronously. Because it doesn’t add semantics to ArrayBuffer, it has no distracting mime type.

The problem holding ByteArray back is the eval twin problem. If we’re willing to consider whitelisting Blob as a ses builtin, then we should discuss whether we could whitelist ByteArray (under some name) similarly, avoiding the same eval twin problem. If we do, we should also consider proposing such an immutable variant of ArrayBuffer to tc39. We cannot propose Blob to tc39 because of stream(). An immutable variant of ArrayBuffer has no such problem.

kriskowal commented 3 months ago

The eval twin problem explained at #1583 seems fatal to having passStyleOf certify as passable any object with non-builtin methods. That's why all our passable types are operated on by functions rather than having their own methods. (Attn @mhofman you know why ;) )

To be clear, I’m not proposing that passStyleOf would certify an object as being pass-style=CopyBytes, with methods that are safe to use, but suggesting that we (maybe) could use a distinguished method to convert it into a handle to the immutable content it stored at the time it was first recognized by passStyleOf and thereafter treated as an opaque handle to that data.

To that end, maybe we could even adopt a TypedArray, which we could at least capture by built-in methods.

erights commented 3 months ago

For a new data type with a new method, I see only these choices:

This leaves us with the adoption option. I don't see any other choice.

I don't see a way to do an ocapn ByteArray as a Passable classified as such by passStyleOf without introducing a new data type with a new method, since all the existing language builtins for binary data cannot be made immutable in place. The ByteArray at pass-style/src/byteArray.js inherits from ArrayBuffer.prototype and tries to act like an immutable variant/subclass of ByteArray, by overriding only the query-only methods to work. We could do likewise for an immutable variant/subclass of UInt8Array. But in both cases, there's no way to enable the method as inherited from the shared super prototype to work, unless we to a ses-initialization-time repair to replace the query only methods with ones that detect if they're applied to one of these immutable variants.

The problem with starting with UInt8Array instead is indexed property access. The only way to support this on an immutable UInt8Array variant is to make it a proxy, which I do not want to do just to get to ocapn conformance.

So, I'd like to start by proposing the ByteArray at pass-style/src/byteArray.js as a candidate to move into ses as an honorary new whitelisted shared-global class.

erights commented 3 months ago

I changed the title for clarity. But if it is too narrow for what this issue is intended for, change it back and we can start a separate narrower issue.

mhofman commented 3 months ago

I don't think we should adopt the Blob API wholesale. As stated it has major drawbacks like being asynchronous only, entraining the complex Web Streams API, and carrying mime type meta information.

The ByteArray defined at pass-style/src/byteArray.js is roughly what I had in mind for a Blob-like object that internally holds the binary data, with a method to get a copy of said binary data as a normal ArrayBuffer (or maybe Uint8Array). Where I think that ByteArray is mistaken is to pretend it is a an ArrayBuffer by inheriting from its prototype. It should just wrap the binary data, provide the slice method and byteLength property, and that's it.

My understanding of the problem in #1583 is that we can't trust an object with methods to behave as inert copy data unless we can recognize it, and that we can't recognize objects because of eval twins. I would however say we're already in a world where we need to recognize objects: for plain objects and arrays, unless we have a predicate to test for exotic proxy behavior, we shouldn't trust these either to be copy passable.

Maybe we can take the approach of promises and add a static method (or even just rely on the constructor, if we drop validation) to ByteArray that adopts another ByteArray? And if the argument is already a recognized ByteArray it returns it?

const bytes = Uint8Array.from([1,2,3,4,5,6,7,8]);
const ba1 = ByteArray(bytes);
const ba1Bytes = ba1.slice();
assert(ba1Bytes !== bytes); // However the contents are the same
const ba2 = ByteArray(ba1);
assert(ba2 === ba1); 

That of course assumes like promises that there is a "genuine" / intrinsic ByteArray, but I think it's fine for passStyle installed as a sort of trusted shim to define this. We already need to switch to a more centralized passStyleOf anyway because of the problem of proxy recognition. But I don't think we automatically need to move ByteArray into the ses shim layer, passStyleOf can still lookup a global ByteArray and use it to validate objects that claim to be that type.

dckc commented 1 month ago

Wow... this is done?! passStyleOf can recognize an immutable byte array? That suggests we can marshal byte arrays now... I've been wanting to do that since January: https://github.com/endojs/playground/pull/2#discussion_r1446971940

Help me understand how?

I don't see anything about ByteArray in docs for passStyleOf nor passStyleOf.js.

Where should I look?

erights commented 1 month ago

Sigh. Likely another github automation screwup. Not the first time. I’ll reopen.

The good news is the thing that caused it is genuine progress towards closing this. But there’s a stack of about 3 PRs needed to do so.

Thanks for noticing!

erights commented 1 month ago

The cause of the github automation screwup:

We plan to fix https://github.com/endojs/endo/issues/1331 in a stack of PRs starting with this one