Open warner opened 4 years ago
@erights let me know what you think, you're more familiar with the ibidTable than me, as well as the Interface considerations.
Hm, now that I think about it, I bet I can address the refcounting issue by moving the increment from convertSlotToVal()
out into whoever calls that (dispatch.deliver
, etc). I have to do something like that anyways for the target of a notifyFulfillToPresence
, where we don't use m.unserialize()
on the resolution (since it's always a Presence), but we need to increment the refcount anyways.
I think the other considerations still apply.
Note to assignees: this ticket is pretty old, and probably contains some no-longer-correct assumptions.. do not implement it as described without updating the priors first.
In particular, we decided against having multiple concurrent Representatives for a single vref (which would have exposed GC behavior to userspace, via rep1 === rep2
). We found a way to track the existing ones without also having the syscalls become dependent upon GC.
What is the Problem Being Solved?
We have two incoming features that may require slight changes to the way
marshal
works. The first is @FUDCo 's #1846 / #455 secondary-storage / "Virtual Objects + Representatives" feature. The second is my #1872 GC work.The fundamental requirements are:
I think this translates into two new requirements on
marshal
:capdata
whichmarshal
emits during serialization should never duplicate object/promise IDs in thecapdata.slots
arrayslotToVal
exactly once for each entry incapdata.slots
The first requirement is currently honored, but only because
obj1 !== obj2
currently implies thatvalToSlot(obj1) !== valToSlot(obj2)
. That will change when #1846 lands, because two different representatives for the same virtual object will be distinct objects (obj1 !== obj2
), but both will convert into the same object ID (valToSlot(obj1) === valToSlot(obj2)
).The second requirement is currently honored because of the ibidTable, which will suffer from the same change.
Description of the Design
For serialization, I think marshal's
serializeSlot
needs to haveslotMap
map from "slot" (e.g.o+44
) toslotIndex
, rather than mappingval
toslotIndex
. This map should be consulted after callingconvertValToSlot()
, so that a second representative will get mapped to the sameslotIndex
even though the first representative was a different object. The code that currently does:should instead do something like:
Note that our
ibidTable
means thatserializeSlot
is never called twice for the sameval
, so the oldslotMap
was somewhat redundant. I remember either not having or not trusting theibidTable
when writingserializeSlot
way back then.For deserialization, I think we should call
convertSlotToVal()
on every entry ofcapdata.slots
immediately upon entry, and stash the resulting values for later use by therawTree['@qclass'] === 'slot'
clause. The currently implementation defers callingconvertSlotToVal()
until it encounters a@qclass: 'slot'
node, and thecapdata.body
object graph might or might not contain exactly one of these percapdata.slots
element. (This will depend upon the serializer, and how theibidTable
handles matched Representatives, and I think it would take more work than described above to guarantee this behavior).The code that currently does:
might instead do:
One significant complication is the handling of
iface
: currently these are stored in thedata.body
nodes, and are not visible fromdata.slots
. This actually suggests a latent bug: if the capdata contains two references to the same slot but with different interface names, which interface name will the receiver see?Motivation: GC refcounting
The GC work (#1872) I'm doing depends upon the kernel (who is sending object references into a vat) and the liveslots layer (which is receiving those object references within the vat) agreeing upon how many times each object ID has been seen. The vat's
decref
messages are arbitrarily delayed, since they are driven byFinalizationRegistry
callbacks. We have a counter scheme to tolerate this delay: the kernel increments a counter for each object ID as it sends the message into the vat, and when the vat observes the import to be dropped, the vat sends the last counter value back up to the kernel. The kernel decrements its counter by the vat's delta, and only drops the object if the result is zero. This is tolerant of the kernel mentioning an object ID after the vat had dropped its last copy (such that the message will cause the vat to create a new Presence): the kernel must not drop the object in that case.For this to work, the kernel needs a way to scan capdata for object references. The kernel should not have to parse the
capdata.body
string to find them: we've put a lot of work into making sure the kernel never needs to understand or modifycapdata.body
. Fortunately the kernel can get this information fromcapdata.slots
instead, which it must enumerate anyways as it translates those slots through the c-list.The kernel's approach must match the one used by liveslots. When liveslots deserializes a message, it will increment its counter once per call to
convertSlotToVal
(it uses this call to sense the existence of another refcount).When a message contains multiple references to the same object (perhaps the sender called
x!.foo(obj1, obj2, obj1)
, or(reprA1, reprA2)
), the capdata might have multiplecapdata.body
nodes that all point to the samecapdata.slots[index]
entry. If the deserializer callsconvertSlotToVal
once percapdata.body
node, but the kernel increments its counter once percapdata.slots
entry, these could diverge, causing imports to be freed too early, or never freed at all.Motivation: no communication channels
When a message references a virtual object, the receiver's arguments will contain a "Representative" that combines the configured behavior (methods) of the object type with the state of a particular object ID. The receiver may hold multiple Representatives that all refer to the same virtual object (we cannot avoid this without unbounded storage or user-level nondeterminism). The specific policy by which the inbound message delivery code creates new Representatives needs to ensure that it doesn't reveal anything about e.g. its internal LRU cache, because that would allow one object graph to communicate with an unrelated one: the speaker sends lots of messages (to flush the shared LRU cache) and the listener measures whether other representatives it receives are the same object or are new ones.
Whatever information is revealed in one vat, for safety's sake, we should make sure that information does not leak into other vats. No matter what collection of identical- or distinct- Representatives (for the same virtual object) we might put into the arguments, the receiver should not be able to tell anything beyond the virtual object IDs. Sending
x~.foo(reprA1, reprA1)
(same representative) orx~.foo(reprA1, reprA2)
(different representatives of the same virtual object) should not be distinguishable by the receiver.With our current
ibidTable
, I think these would produce distinct capdata bodies and slots. The first would have anibid
backref, and just one slot. The second would have two@qclass: slot
nodes with distinct index values, andcapdata.slots = [objA, objA]
). The receiver (x.foo = (y, z) => { ... }
) could comparey === z
to distinguish the two cases.My proposed change above would change this to producing distinct capdata bodies but the
slots
array would be identical (just[objA]
). The receiver would callconvertSlotToVal
exactly once (becausecapdata.slots.length === 1
), andfoo()
would observey === z
in both cases.The strongest interpretation of this goal would suggest that both messages should result in identical capdata (both
.both
and.slots
). I think that would require bypassing theibidTable
check when the value is recognized byconvertValToSlot
, which would need deeper surgery in the serializer than I want to apply. I might be content with knowing that user-level receiver code cannot distinguish between these cases, even if the liveslots-level receiver code could.Security Considerations
Comparing representatives for object identity might allow some malicious code to learn something about the path those objects took on their way to the receiver (did they arrive in the same message, or different ones? Were they created in a vat that produced them during the same crank, or different ones?). This feels like a subtle issue that won't be a problem most of the time, but would be hard to audit for. (It might be nice to have a lint rule that complains about identity comparison between objects known to be representatives).
If the refcounts get out of sync (perhaps because malicious code is deliberately constructing object graphs to provoke different behavior on the two sides of the kernel-val boundary), objects could be GCed too early, which will probably cause a vat-fatal illegal syscall if/when the missing object is referenced.