Open kriskowal opened 1 year ago
attn @erights @gibson042 @mhofman
I'm extremely concerned by the performance overhead of such a user-land abstraction. We should pursue a standard approach which allows engines to avoid array buffer copies, and minimal overhead on read access.
One such proposal at stage 1 is @Jack-Works's Limited ArrayBuffer
Shimming a standards track solution would be good.
Ah, memories. (My abandoned hobbies include a decade old fantasy of JavaScript having byte strings.)
Attn @phoddie
If you'd like an efficient read-only ArrayBuffer
, that code is already in xst as part of our Secure ECMAScript / Hardened JavaScript work . This fragment...
const bytes = Uint8Array.of(1, 2, 3);
petrify(bytes.buffer);
bytes[2] = 1;
...throws TypeError: ?: read-only buffer
.
Yes, a fully standard solution would be welcome.
A petrified buffer would certainly be eligible for passing over a CapTP.
Shimming petrify I think would require us to replace the TypedArray constructors and presenting all typed arrays as proxies, such that we could emulate petrification of a TypedArray with an ArrayBuffer shared by other TypedArrays.
This may be what you mean, but... I wouldn't suggest shimming petrify
as it is extremely broad & general. Instead, have a focused way to make an ArrayBuffer read-only. The implementation of that when running on XS will petrify
instead of Proxy and such and so avoid oodles of runtime overhead.
To shim making an ArrayBuffer itself read-only, we'd need a way to actually make an ArrayBuffer read-only in place. The only thing we know how to shim would be something that returns what seems to be a new read-only ArrayBuffer.
Same for a TypedArray.
Aye, I lay it out like this because oodles of runtime overhead might not be acceptable, even in an intermediate step toward a standard. But if petrify
had standards ambitions, we could shim the intersection of petrify
and TypedArrays, throwing for anything we can’t petrify. Or we could anticipate and shim a world where TypedArray can be frozen. Either of these strategies would require us to replace the TypedArray and ArrayBuffer constructors such that they return proxies and deny all programs thereafter direct access to TypedArray and ArrayBuffer instances, such that freezing one façade of an ArrayBuffer would freeze all the façades of the same ArrayBuffer.
The web platform’s solution to this problem is opaque Blob objects. If you want to see the contents, you have to copy them into a TypedArray. We could certainly harden and make blobs passable too without doing anything to TypedArrays.
To shim making an ArrayBuffer itself read-only, we'd need a way to actually make an ArrayBuffer read-only in place. The only thing we know how to shim would be something that returns what seems to be a new read-only ArrayBuffer.
Y'all are the experts in patching things into a web-ish runtime. My goal is to avoid an ugly optimization problem around Proxy
in XS when it already supports read-only buffers efficiently. If it is a problem that petrify
does too much here, the functionality can easily be repackaged in a call that is focused on the needs of this particular problem.
Maybe side-stepping the problem with Blob
is pragmatic, though there's more data copying. Perhaps implementing Blob
with a petrified ArrayBuffer
would allow Blob.prototype.arrayBuffer()
to be almost free?
replace the TypedArray and ArrayBuffer constructors
Doh! I'm embarrassed that I didn't think of this. Thanks!
Ok, I just read about Blobs at https://developer.mozilla.org/en-US/docs/Web/API/Blob . Aside from their non-JS stream()
method, which we can omit from our whitelist, they seem like perfectly fine forms of CopyBytes. When we do add CopyBytes to our system, we could call it CopyBlob instead, and have passStyleOf certify that a Blob is a CopyBytes when it is hardened and has no surprising properties.
This would take the pressure off of needing to shim making an ArrayBuffer or TypedArray read-only in place.
Maybe side-stepping the problem with
Blob
is pragmatic, though there's more data copying. Perhaps implementingBlob
with a petrifiedArrayBuffer
would allowBlob.prototype.arrayBuffer()
to be almost free?
I like that!
Maybe side-stepping the problem with
Blob
is pragmatic, though there's more data copying. Perhaps implementingBlob
with a petrifiedArrayBuffer
would allowBlob.prototype.arrayBuffer()
to be almost free?
I think the expectation of Blob.prototype.arrayBuffer()
is that it would produce a mutable array buffer. It could be nearly-free if ArrayBuffers
have a copy-on-write optimization.
Yea, I had the same thought walking back from coffee just now. Alas.
We could add a non standard option bag to arrayBuffer({readonly: true})
, but that's getting kinda verbose for what would likely be the default case in a Hardened JS environment. I honestly doubt much code would break if the default was a readonly buffer, but still.
I actually think copy on write array buffers would be a good general solution on top of being a nice optimization. First it would likely apply to non Blob use cases. For example doing an arrayBuffer.slice()
could return a copy on write buffer (making the original arrayBuffer
copy on write as well obviously). At that point we could make construction of a Blob
similarly copy free with a copy on write guard on the original buffer, aka a Blob
could be a user-land concept that simply does the following (simplified to single array buffer input):
class Blob {
constructor([inputBuffer]) {
this.#buffer = inputBuffer.slice(0);
}
async arrayBuffer() {
return this.#buffer.slice(0);
}
}
I mentioned Copy-on-Write in the context of transfer()
in the November Plenary. I remain convinced that CoW for AB would solve a lot of performance issues, at the expense I understand of implementation complexity.
Interesting bit of related work: Hex template literal | DeviceScript
example:
const buf: Buffer = hex`00 ab 12 2f 00`
h/t @turadg
FWIW - the Moddable SDK has long done something like that for UUIDs to simplify BLE code. This generates an ArrayBuffer with the bytes of the UUID:
const CHARACTERISTIC_UUID = uuid`9CF53570-DDD9-47F3-BA63-09ACEFC60415`;
motivation: I was trying to do an isolated signer as an endo app, but I hit:
Error#2: Cannot pass mutable typed arrays like Uint8Array(33) [...]
So I had to put the signer and the network client in the same vat. :-/
hdWallet.js 0130695 https://github.com/dckc/finquick/pull/64
My latest batch of thoughts on this problem:
We need a solution that is a good balance between:
We might have to take a page from the promise.then distinguished method trick, but we can use a symbol-named method. This would allow eval twins to assimilate CopyBytes and still keep a private WeakMap of known-to-be-unmodifiable ArrayBuffers. Consider this sketch:
const copyBytesSymbol = Symbol.for('copyBytesInto');
const fixedBytes = new WeakMap();
const copyBytesPrototype = {
[copyBytesSymbol](target) {
target.set(fixedBytes.get(this), 0);
}
};
const makeCopyBytes = array => {
let copyBytes = fixedBytes.get(array);
if (copyBytes !== undefined) {
return copyBytes;
}
copyBytes = Object.create(copyBytesPrototype);
fixedBytes.set(copyBytes, fromByteArray(array)); // XXX
return harden(copyBytes);
};
const copyBytesInto = (target, source) => {
const internal = fixedBytes.get(source);
if (internal !== undefined) {
target.set(source, 0);
} else if (source[copyBytesSymbol] !== undefined) {
assert.typeof(source[copyBytesSymbol], 'function');
source[copyBytesSymbol](target)
}
};
What "promise.then distinguished method trick"?
What "promise.then distinguished method trick"?
Using a distinguished method like "then"
to recognize an object that an eval-twin can assimilate into its own domain by invocation of that method. One could call it the “Thenable Pattern”.
The eval twin problem explained at https://github.com/endojs/endo/issues/1583 seems fatal to having passStyleOf
certify as passable any object with non-builtin methods. That's why all our passable types are operated on by functions rather than having their own methods. (Attn @mhofman you know why ;) )
So Blob
is an eligible representation because all its methods are builtins on Node.js and the web platform, and we can Moddable to implement Blob on XS?
const blobArrayBuffer = uncurryThis(Blob.prototype.arrayBuffer);
const arrayBuffer = await blobArrayBuffer(new Blob(['hello, world']))
We've avoided implementing Blob
for a very long time. I wouldn't be sad for that to extend indefinitely. At least you aren't asking about Buffer
.
That said, iIt should mostly be possible, with the exception of stream()
. The API surface area isn't that big. We'd need to be thorough in testing to ensure the API is being faithfully emulated.
Ok, I tried learning about Blob
by writing a minimal toy shim and adding it to https://github.com/endojs/endo/pull/1538 at src/ses/blob-shim.js . Compare with the pass-style/src/byteArray.js in the same PR. Both encapsulate an actual ArrayBuffer
to hold their data. I don’t love either. But some problems I see with Blob
:
arrayBuffer()
and text()
, both of which return a promise for the contents in the requested form.Blob
as part of the, so to speak, language, to avoid the eval-twin problem would bring in, via the stream()
method, ReadableStream
. To avoid whitelisting ReadableStream
we’d omit stream()
, making this different than the “standard” Blob
.type
string. ocapn ByteArrays do not. Clearly not a fatal problem, but whatever we do here will surprise someone.The ByteArray
class at pass-style/src/byteArray.js is effectively an immutable subclass of ArrayBuffer
. The ByteArray.slice
instance method returns an actual ArrayBuffer
, thereby getting the contents out synchronously. Because it doesn’t add semantics to ArrayBuffer
, it has no distracting mime type
.
The problem holding ByteArray
back is the eval twin problem. If we’re willing to consider whitelisting Blob
as a ses builtin, then we should discuss whether we could whitelist ByteArray
(under some name) similarly, avoiding the same eval twin problem. If we do, we should also consider proposing such an immutable variant of ArrayBuffer
to tc39. We cannot propose Blob
to tc39 because of stream()
. An immutable variant of ArrayBuffer
has no such problem.
The eval twin problem explained at #1583 seems fatal to having
passStyleOf
certify as passable any object with non-builtin methods. That's why all our passable types are operated on by functions rather than having their own methods. (Attn @mhofman you know why ;) )
To be clear, I’m not proposing that passStyleOf would certify an object as being pass-style=CopyBytes, with methods that are safe to use, but suggesting that we (maybe) could use a distinguished method to convert it into a handle to the immutable content it stored at the time it was first recognized by passStyleOf and thereafter treated as an opaque handle to that data.
To that end, maybe we could even adopt a TypedArray, which we could at least capture by built-in methods.
For a new data type with a new method, I see only these choices:
passStyleOf
cannot certify it due to the eval twin problem. This prevents us from extending marshal
from unserializing an ocapn ByteArray
to anything in place, which is a non-starter.This leaves us with the adoption option. I don't see any other choice.
I don't see a way to do an ocapn ByteArray
as a Passable classified as such by passStyleOf
without introducing a new data type with a new method, since all the existing language builtins for binary data cannot be made immutable in place. The ByteArray
at pass-style/src/byteArray.js inherits from ArrayBuffer.prototype
and tries to act like an immutable variant/subclass of ByteArray
, by overriding only the query-only methods to work. We could do likewise for an immutable variant/subclass of UInt8Array
. But in both cases, there's no way to enable the method as inherited from the shared super prototype to work, unless we to a ses-initialization-time repair to replace the query only methods with ones that detect if they're applied to one of these immutable variants.
The problem with starting with UInt8Array instead is indexed property access. The only way to support this on an immutable UInt8Array variant is to make it a proxy, which I do not want to do just to get to ocapn conformance.
So, I'd like to start by proposing the ByteArray
at pass-style/src/byteArray.js as a candidate to move into ses as an honorary new whitelisted shared-global class.
I changed the title for clarity. But if it is too narrow for what this issue is intended for, change it back and we can start a separate narrower issue.
I don't think we should adopt the Blob
API wholesale. As stated it has major drawbacks like being asynchronous only, entraining the complex Web Streams API, and carrying mime type meta information.
The ByteArray
defined at pass-style/src/byteArray.js
is roughly what I had in mind for a Blob-like object that internally holds the binary data, with a method to get a copy of said binary data as a normal ArrayBuffer
(or maybe Uint8Array
). Where I think that ByteArray
is mistaken is to pretend it is a an ArrayBuffer
by inheriting from its prototype. It should just wrap the binary data, provide the slice
method and byteLength
property, and that's it.
My understanding of the problem in #1583 is that we can't trust an object with methods to behave as inert copy data unless we can recognize it, and that we can't recognize objects because of eval twins. I would however say we're already in a world where we need to recognize objects: for plain objects and arrays, unless we have a predicate to test for exotic proxy behavior, we shouldn't trust these either to be copy passable.
Maybe we can take the approach of promises and add a static method (or even just rely on the constructor, if we drop validation) to ByteArray
that adopts another ByteArray
? And if the argument is already a recognized ByteArray
it returns it?
const bytes = Uint8Array.from([1,2,3,4,5,6,7,8]);
const ba1 = ByteArray(bytes);
const ba1Bytes = ba1.slice();
assert(ba1Bytes !== bytes); // However the contents are the same
const ba2 = ByteArray(ba1);
assert(ba2 === ba1);
That of course assumes like promises that there is a "genuine" / intrinsic ByteArray, but I think it's fine for passStyle installed as a sort of trusted shim to define this. We already need to switch to a more centralized passStyleOf anyway because of the problem of proxy recognition. But I don't think we automatically need to move ByteArray
into the ses shim layer, passStyleOf can still lookup a global ByteArray
and use it to validate objects that claim to be that type.
Wow... this is done?! passStyleOf
can recognize an immutable byte array? That suggests we can marshal byte arrays now... I've been wanting to do that since January: https://github.com/endojs/playground/pull/2#discussion_r1446971940
Help me understand how?
I don't see anything about ByteArray
in docs for passStyleOf nor passStyleOf.js.
Where should I look?
Sigh. Likely another github automation screwup. Not the first time. I’ll reopen.
The good news is the thing that caused it is genuine progress towards closing this. But there’s a stack of about 3 PRs needed to do so.
Thanks for noticing!
The cause of the github automation screwup:
We plan to fix https://github.com/endojs/endo/issues/1331 in a stack of PRs starting with this one
Our current best solution for carrying bytes is a Uint8Array, which we use pervasively for hardened streams. This is one of the reasons that we treat Typed Arrays like Map for the purposes of hardening. The content is mutable but the API surface is tamper-proof. However, the content is readable and writable by both the sender and receiver over time, so like a Map, TypedArrays are not passable. That makes them a poor foundation for remote streams in an ocap discipline.
I propose that we need an analogous solution to the
CopyMap
andMapStore
for transmission of immutable byte strings, tentativelyCopyBytes
andBytesStore
, which would be passable and provide a firmer foundation for streams (async iterators of byte strings).CopyBytes
should mesh well with TypedArrays, particularly Uint8Array and ArrayBuffer. To that end, it may be necessary for them to be Proxies, such thatuint8Array.set(copyBytes)
is sensible. Constructing an immutable copy of the content of an array-like should be possible, likecopyBytes(uint8Array)
ormakeCopyBytes([0, 1, 2, …, 127])
.CopyBytes
would be backed by a TypedArray but would attenuate mutability. SlicingCopyBytes
should be trivial. As with glyph strings, ropes are a possible optimization in the fullness of time.As a critical optimization, we often use ring buffers for byte streams. We could conceivably also support another type for revocable copy bytes, that would permit a stream to reclaim the underlying byte range when the consumer signals it’s ready for another subrange.