Canonical encoding spec issue

capnproto / go-capnp

Cap'n Proto library and code generator for Go

https://capnproto.org

Other

1.19k stars 108 forks source link

Canonical encoding spec issue #581

Open araxus opened 1 month ago

araxus commented 1 month ago

The canonicalization spec has an issue. It requires encoding pointers in pre-order, but doesn't require the same for data fields in an effort to avoid requiring a schema and reflection.

Because segment arenas are append-only allocators in the implementation, "cruft" data may be present in the slice from temporary operations but otherwise unattached to the root struct (i.e. orphaned). Obviously, this library does not implement orphans or adoption. Due to this line of code, the segment is opaquely copied along with the "cruft" data, which should not be part of a canonical security message. Furthermore, the data elements may not be in pre-order layout.

For a canonical security message to be truly useful, all fields should be in a defined order, including non-pointer data fields. Two individuals should be able to produce the exact same semantic message independently, using different libraries, languages, and implementations, and have their canonical form (and ensuing cryptographic hash) match. Verifying a raw-payload (when the spec has already chosen to rearrange the pointers) is pointless when the same isn't applied to the data fields. Furthermore, it's more powerful to cryptographically verify a network object versus a raw message, which is what Cap'n'Proto is all about. A solid example is serializing and signing RPC persistent capabilities encoded as a struct.

Proposed changes:

The canonical encoding spec should be amended to require (perhaps as an alternative) that data fields are encoded in pre-order according to the schema; obviously this now adds the requirement of a schema and reflection since unlike pointers they don't have a pre-existing pointer table. @kentonv
The func Canonicalize(s Struct) ([]byte, error) method should be improved to func Canonicalize(s Struct) (Struct, error). It's a more useful API if users don't have to re-read a message from []byte; e.g. sending a struct via RPC interface in canonical form, alongside a cryptographic signature that verifies it. It could also add an optional typeId ...uint64 parameter to enable reflection (the alternative encoding mode suggested previously).
capnp.Struct should add an exported Canonical() bool accessor, indicating that the struct uses a single segment in the expected form. This makes it useful to embed in larger multi-segment messages (e.g. an RPC call).
It would thus be useful to also add a capnp.Struct.Canonicalize() method.
The indicated code should be changed to use reflection instead of copy. Copying the fields over one-by-one in pre-order also leaves any orphaned data in the source segment behind. The extra overhead from not using copy is insignificant in the context of performing cryptogrpahic operations anyways.

As a side-note, I've noticed an undesirable bug(?) in the RPC implementation; messages with "cruft" data in their segment are sent on the network despite the data's orphaned status. This then requires great care on the library user's part to never 'build' messages directly in the target segment if they ever have temporary operations, and are then thus likely to mostly implement their own 'canonical form' by having their code perform operations in a very specific, idiomatic style to avoid this. I personally ran into this writing unit test tables, where it look a hundred lines of code to build a message, and then I used capnp.Struct.CopyFrom() for subsequent message test cases where I altered a single field; the RPC data sent included (potentially large payloads) of the previous data copied and orphaned from the original message. The implication of an improvement is perhaps capnp.Struct.CopyFrom should essentially do it in canonical fashion as proposed above as an option. We could add a canonical ...bool field to the method to implement this without breaking backwards compatibility (which this library in its 'alpha' state does reserve the right to do).

I'm essentially suggesting that canonical form should be a merkle tree; i.e. blockchain. I've implemented a blockchain using Cap'n'Proto as the mmap ledger, and had to implement my own Canonicalize using reflection. ;)

matheusd commented 1 month ago

(This isn't a full response, I'm only starting to get into go-capnp development)

[...] this now adds the requirement of a schema and reflection since unlike pointers they don't have a pre-existing pointer table

Perhaps it should suffice to also follow the pointers in order to build the canonicalized data section?

The extra overhead from not using copy is insignificant in the context of performing cryptogrpahic operations anyways.

Not everyone may be paying the performance price of a cryptographic operation such that switching to this (hypothetical) new implementation wouldn't introduce relevant performance regressions. Though, to be fair, this doesn't invalidate your earlier points about correction of the canonicalization process.

As a side-note, I've noticed an undesirable bug(?) in the RPC implementation; messages with "cruft" data in their segment are sent on the network despite the data's orphaned status. This then requires great care on the library user's part to never 'build' messages directly in the target segment if they ever have temporary operations

Yes, this is an issue that is easy to trip on (IMO) due to the API design of the go library: using SetXXX vs WriteXXX, it's easy to think (if one does not know about the internal implementation) that the SetXXX operation will overwrite existing data vs allocating new and leaving the old data orphaned.

I have some tentative code in my refactor PR which allows overwriting data (when new data is smaller than old data), but it's only a prototype and in any case does not invalidate your points about the possibility of wasted space and extraneous data.

I've implemented a blockchain using Cap'n'Proto as the mmap ledger, and had to implement my own Canonicalize using reflection. ;)

Is that in a public repo that you could share?

kentonv commented 1 month ago

The canonicalization spec has an issue. It requires encoding pointers in pre-order, but doesn't require the same for data fields in an effort to avoid requiring a schema and reflection.

Huh?

The data section of a struct has a static layout, with each field located at a fixed offset from the start of the section, and having a fixed size. It cannot be re-ordered at all. It therefore is canonical as-is.

I don't know what "pre-order" would even mean for the data section, as the data section is not a tree, it's just a fixed-length byte blob.

araxus commented 1 month ago

The canonicalization spec has an issue. It requires encoding pointers in pre-order, but doesn't require the same for data fields in an effort to avoid requiring a schema and reflection.

Huh?

The data section of a struct has a static layout, with each field located at a fixed offset from the start of the section, and having a fixed size. It cannot be re-ordered at all. It therefore is canonical as-is.

I don't know what "pre-order" would even mean for the data section, as the data section is not a tree, it's just a fixed-length byte blob.

This golang implementation uses a slice Arena. Code can 'set' fields in arbitrary order, thus appending them to the slice expansion potentially randomly. Data fields are not serialized out in ordinal (or any) order, rather the entire sub-slice is opaquely copied and may include orphaned data; sorry if I confused 'pre-order' here, you're right the data section is not a tree. Two different implementations can thus disagree about the byte ordering in the data section. Not typically important for say RPCs in general, but has implications in cryptographic scenarios.

araxus commented 1 month ago

Perhaps it should suffice to also follow the pointers in order to build the canonicalized data section?

This direction of thinking is good, as I think it walks around and solves the orphaned data problem. However, it only delays the ordering problem. For example, a pointer may reference a struct with only data fields; their serialization order is not guaranteed.

What I've done in the past is build my own registry cache on top of schemas.DefaultRegistry for handier access to schema.Node and schema.Field rather than just the []byte serialization of the CodeGeneratorRequests (which are not canonical). Then I'd used something like this in a canonicalization:

// CodeOrderFields returns the fields in canonical order
// (the order in which they appeared in the schema).
func (n Node) CodeOrderFields() (fields []schema.Field, err error) {

  var list schema.Field_List
  if list, err = n.StructNode().Fields(); err != nil {
    return
  }
  count := list.Len()
  fields = make([]schema.Field, count)
  for i := 0; i < count; i++ {
    f := list.At(i)
    fields[f.CodeOrder()] = f
  }
  return

} // ledger.Node.CodeOrderFields()

As a side-note, I've noticed an undesirable bug(?) in the RPC implementation; messages with "cruft" data in their segment are sent on the network despite the data's orphaned status. This then requires great care on the library user's part to never 'build' messages directly in the target segment if they ever have temporary operations

Yes, this is an issue that is easy to trip on (IMO) due to the API design of the go library: using SetXXX vs WriteXXX, it's easy to think (if one does not know about the internal implementation) that the SetXXX operation will overwrite existing data vs allocating new and leaving the old data orphaned. I have some tentative code in my refactor PR which allows overwriting data (when new data is smaller than old data), but it's only a prototype and in any case does not invalidate your points about the possibility of wasted space and extraneous data.

I think SetXXX suffers the copy problem (where the source may be larger than the dest) by simply abandoning the original data as an orphan, expanding the arena slice, and appending at the end. WriteXXX would need to default to this behavior still in this scenario. The encoding is not an iovec where it could overwrite the first part, then assuming another allocation is in the way, allocate at the end (or anywhere) and put the remaining data there. However, if the Arena was using iovec, it could just point at the original data and not copy at all (assuming ownership passing). This kind of argues for having an Orphan implementation in this library.

However, that thought does lead me to think this might be partly solvable with a new kind of Arena based on iovec that maximizes memory efficiency and reuse (at the expense of having the iovecs); really only advantageous with larger payloads in terms of size and avoiding copies (ideally entirely). Canonicalizing the data would obviously coalesce back to a SingleSegment arena.

I've implemented a blockchain using Cap'n'Proto as the mmap ledger, and had to implement my own Canonicalize using reflection. ;)

Is that in a public repo that you could share?

It's not public (yet), but happy to share relevant bits. Would it be useful to PR an enhanced Registry?

kentonv commented 1 month ago

Code can 'set' fields in arbitrary order, thus appending them to the slice expansion potentially randomly.

Setting a data field does not "append" anything. It overwrites bytes at a specific offset within the data section of the struct. The data section is allocated with a fixed size when the struct itself is first created. Calling a setter for a data field does not allocate any new space, it just overwrites part of that existing space.

Two different implementations can thus disagree about the byte ordering in the data section.

No, they cannot.

araxus commented 1 month ago

Code can 'set' fields in arbitrary order, thus appending them to the slice expansion potentially randomly.

Setting a data field does not "append" anything. It overwrites bytes at a specific offset within the data section of the struct. The data section is allocated with a fixed size when the struct itself is first created. Calling a setter for a data field does not allocate any new space, it just overwrites part of that existing space.

Two different implementations can thus disagree about the byte ordering in the data section.

No, they cannot.

You're right; it appears I forgot about that since the data section only contains primitive types, and things like Text are pointers. My apologies again!

Should I repurpose this issue to a request for an Orphan implementation to deal with that issue?

kentonv commented 1 month ago

Should I repurpose this issue to a request for an Orphan implementation to deal with that issue?

I am not sure if I'd say there's an issue that can be fixed here.

I'm not deeply familiar with with the Go implementation. In the C++ implementation, if you unlink some part of the message from the tree and drop it, the unlinked data is zero'd out -- but there is still a hole left over which cannot be reused, and that wastes bytes on the wire. To fix that, you have to create a new message and copy the struct over to it -- this essentially garbage-collects the holes.

It sounds like you're saying the Go implementation doesn't zero anything out? I would personally argue that the zeroing is desirable behavior and the Go implementation should probably add it if it doesn't do so already.

But there's really no way to get rid of the "holes" left behind, except by copying, and the whole design goal is to be zero-copy. The fact of the matter is that Cap'n Proto message builders are not great at mutation. You really need to think of them as write-once. This is a just a consequence of the zero-copy goal.

matheusd commented 1 month ago

However, that thought does lead me to think this might be partly solvable with a new kind of Arena based on iovec that maximizes memory efficiency and reuse (at the expense of having the iovecs)

I'm not familiar with iovec, but the main challenge in the current arena design/implementation is that the arena doesn't really know when an orphan is created (because it doesn't know when a pointer field has been reassigned from an existing, previously assigned location). In other words, it knows additional space was needed, but it doesn't know the prior location is not referenced from anywhere anymore.

It's not public (yet), but happy to share relevant bits. Would it be useful to PR an enhanced Registry?

I was more interested in the context around where you were using it, no worries.

It sounds like you're saying the Go implementation doesn't zero anything out?

I doesn't zero out the (now) orphaned data, no. This can be easily proven with the following code:

func TestLeftover(t *testing.T) {                                               
        _, seg, _ := capnp.NewMessage(capnp.SingleSegment(make([]byte, 0, 1024)))
        root, _ := air.NewRootBenchmarkA(seg)
        root.SetSiblings(0xabcd)
        root.SetName(">This is my name<")
        root.SetName(">My Second Name<")
        root.SetName("")
        arenaData := seg.Data()
        t.Log(spew.Sdump(arenaData))
}

Which yields the following result

=== RUN   TestLeftover
    my_test.go:19: ([]uint8) (len=96 cap=1024) {
         00000000  00 00 00 00 03 00 02 00  00 00 00 00 00 00 00 00  |................|
         00000010  cd ab 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
         00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
         00000030  3e 54 68 69 73 20 69 73  20 6d 79 20 6e 61 6d 65  |>This is my name|
         00000040  3c 00 00 00 00 00 00 00  3e 4d 79 20 53 65 63 6f  |<.......>My Seco|
         00000050  6e 64 20 4e 61 6d 65 3c  00 00 00 00 00 00 00 00  |nd Name<........|
        }

However, after cannonicalizazing the above (with capnp.Canonicalize(capnp.Struct(root))), yields the following data:

=== RUN   TestLeftoverCanonical
    my_test.go:30: ([]uint8) (len=24 cap=1024) {
         00000000  00 00 00 00 02 00 00 00  00 00 00 00 00 00 00 00  |................|
         00000010  cd ab 00 00 00 00 00 00                           |........|
        }

araxus commented 1 month ago

I'm not familiar with iovec

It's an old libc/syscall idiom for higher throughput in scatter/gather operations, the main idea is to avoid copying by passing arrays of small descriptors (typically located temporarily on the stack during the syscall) to buffers with arbitrary location, possibly non-contiguous. My thinking is that you set a first data pointer (such as text), then set a second data pointer, then go back and change the first (much like your example). An iovec could reuse the original size allocation as a first buffer, then allocate only the remaining needed space; not terribly useful for small data elements, but often useful when assembling various headers for a stream of network packets or block sized file writes. It could even delay any copy operation until the final step; e.g. you could 'canonicalize' a message, but still have it scattered around with a descriptor list, and only upon serializing to a stream or passing it to a DMA operation would you honor the 'single segment' requirement. Done properly, canonical form could even be a default when building messages with negligible overhead; it might even be faster since using a struct.SetXXX(ptr) method copies, whereas instead you could just assume ownership of the passed buffer. Golang slice of slices would achieve a similar thing internally as iovec.

but the main challenge in the current arena design/implementation is that the arena doesn't really know when an orphan is created (because it doesn't know when a pointer field has been reassigned from an existing, previously assigned location). In other words, it knows additional space was needed, but it doesn't know the prior location is not referenced from anywhere anymore. nique ptrs, etc), and solve what you're describing. It would require a new API where instead of passing raw slices, they are contained in Orphan structs with reference-counting operations and accessors. I believe the C++ version of Cap'n'Proto effectively achieves this, but I haven't looked at that implementation in a while. Some parts of this library already does this (e.g. capabilities); it just wasn't carried through the rest of the code.

It sounds like you're saying the Go implementation doesn't zero anything out?

I doesn't zero out the (now) orphaned data, no.

It seems like we should be zeroing this out? Two major benefits:

secure coding; who knows maybe the buffer once held a secret and was reset intentionally, however todays users probably realize (one would hope) this library doesn't memzero, and need to do it manually; it would be safer if automatic (perhaps enabled with an optional finalizer).
better synergy with Cap'n'Proto packed encoding; your example above appears to result in a mostly zero payload; however I think packed encoding is only useful if a message is exceeding network MTU and doing so results in fewer frames transmitted. SIMD/AVX instructions have scatter/gather operations that would likely accelerate packed encoding.

matheusd commented 1 month ago

My thinking is that you set a first data pointer (such as text), then set a second data pointer, then go back and change the first (much like your example). An iovec could reuse the original size allocation as a first buffer, then allocate only the remaining needed space;

On my refactor PR, I have an experiment where I go in, read the pointer, and update the memory location with the new value (instead of simply allocating new space in the segment and updating the pointer). This achieves space reuse, but only for cases where the new text has length <= the original one (and adds padding when the length is < the original, so it doesn't result in a canonical structure).

A natural alternative would be to allocate a new segment for each new object (struct or list), which would resemble the iovec approach. Drawback is that canonicalization would require traversing far pointers to build the resulting single-segment slice (which is a slower).

I expect a true iovec-like approach to building the single segment in canonicalized form like you propose would require significant changes to the lib. And (AFAICT) no IO interfaces in go have scatter/gather alternatives, so it seems to me this would benefit only a (small?) subset of use cases, where data is only used in-process (as opposed to serialized to a file or socket).

I doesn't zero out the (now) orphaned data, no.

It seems like we should be zeroing this out? Two major benefits:

Yes, this seems like a legitimate bug that should be addressed.

araxus commented 1 month ago

And (AFAICT) no IO interfaces in go have scatter/gather alternatives, so it seems to me this would benefit only a (small?) subset of use cases, where data is only used in-process (as opposed to serialized to a file or socket).

There are a few (unofficial) examples around; e.g. https://pkg.go.dev/github.com/google/vectorio. Regardless, the syscalls are generally there on POSIX systems (including Windows). It's probably more of an enterprise style use case is my guess, i.e. when you care about things like hardware offload (DMA) to the NIC; or maybe people are just sticking to C/Rust for those things.

But I was also suggesting that the more common golang idiom for this would be [][]byte as they're basically equivalent; the iovec would only appear if you wanted to carry it through to the kernel without copying. In pseudo for an orphan-able buffer, you're looking at Buffer{iovec: [][]byte{[]byte(">This is "),{[]byte("my name<")}, refcount: 1}; but with finalizers and the GC you wouldn't even need to implement the refcount. If the API used generics it would be easier to switch to a different Arena implementation, but probably too much effort.

I guess I was really hoping to discuss getting to parity with the C++ version of capnp. Anyways, I'll leave this alone now. Fixing the memzero issue would at least allow packed encoding to recover the wasted data on the wire issue even without canonicalization which is what got me looking at this in the first place.

matheusd commented 1 month ago

But I was also suggesting that the more common golang idiom for this would be [][]byte as they're basically equivalent;

Yep, I got that. The current implementation is just nowhere near where it would be possible to do (whence why I mentioned using one segment for each object - this would be "easier" to do in the current implementation, but nowhere near easy in an absolute sense).

If the API used generics it would be easier to switch to a different Arena implementation, but probably too much effort.

I don't see how generics would help. The main issue today is that it's actually impossible to create an Arena implementation externally because it requires accessing private fields in Segment. Hopefully I'll get to a point where I can fix this as well.

araxus commented 1 month ago

And (AFAICT) no IO interfaces in go have scatter/gather alternatives

Well I went looking again, and it turns out ... is it that far off? ;)

func (e *Encoder) write(bufs [][]byte) error {
  _, err := (*net.Buffers)(&bufs).WriteTo(e.w)
  return err
}

https://pkg.go.dev/net#Buffers

On certain machines, for certain types of connections, this is optimized into an OS-specific batch write operation (such as "writev").

So it was implemented for capnp.Message, just not carried through to object buffers.

matheusd commented 1 month ago

Ah, nice find! Looks like it's implemented for both windows and unix *conn objects.

I'll have to reassess some of my plans based on this, though I should probably at the very least give it a rough benchmark.

araxus commented 1 month ago

The main benefit is everything could be zero-copy until the point of serialization. Right now just doing Struct.SetPtr() incurs a copy, when it could just do a reference. You wouldn't see any benefit if the client code just did a simple left-to-right message build, only if it made updates (like the discussed examples in RPC). This is I believe what Kenton was suggesting to stick to, and not modify messages as a way to avoid the issue.

My issue is I've been building template objects for table-driven tests, altering fields in each scenario, and my payloads were getting unexpectedly large and filled with "cruft".

matheusd commented 1 month ago

Sure, but some workflows may want to copy (because they're building an object with well-known size, not rewriting fields, and want to reuse the object reference on another slice that will aliased-upon).

So this would have to be an optional feature (possibly selected upon based on the Arena).

This is I believe what Kenton was suggesting to stick to, and not modify messages as a way to avoid the issue.

Correct. I believe (though now I can't find a reference) this has always been a suggestion for CapNProto usage: not to use it as an intermediate object representation.

araxus commented 1 month ago

Sure, but some workflows may want to copy (because they're building an object with well-known size, not rewriting fields, and want to reuse the object reference on another slice that will aliased-upon).

I've always just used Struct.CopyFrom() when I wanted that behavior; the option is already there. The reference option is not. That's where I got into trouble with my unit testing when I did something like:

    // Create second feature.
    err = capnp.Struct(c.feature[1]).CopyFrom(capnp.Struct(c.feature[0]))
    require.NoError(c.T(), err, "feature[1].CopyFrom(feature[0])")
    id, err = c.feature[1].Id()
    require.NoError(c.T(), err, "feature[1].Id")
    // change some values in "id" ...

matheusd commented 1 month ago

they're building an object with well-known size, not rewriting fields, and want to reuse the object reference on another slice that will aliased-upon).

I've always just used Struct.CopyFrom() when I wanted that behavior; the option is already there. The reference option is not.

Sure, but that pays the price of doing an additional copy, when you wouldn't have to. We probably don't want to downgrade the performance of the existing workflow in favor of a new one - we should try to improve the performance of both :P

araxus commented 1 month ago

I agree, but I specifically ran into this issue in testing where I don't care about performance. I'm more concerned with reuse without having to write a bunch of extra methods to build messages or hundreds of lines more code; I had thought Struct.CopyFrom() was a saving grace here. There's a folder with a bunch of messages that are loaded and permuted, but it turns out I was getting explosive data streams instead ... my mistake ;)

There are use cases where performance would be better. Consider a multi-casting scenario where I need to inform N "hooks" of an event with the client registering their hook by passing a capability to my RPC for me to callback. The messages are all mostly identical, except for each client I need to modify a pointer field. For simplicity, let's say I pass a capability back to the client for the event object for further calls. Instead of using a template object which I use Struct.CopyFrom() and changing the one pointer N times, I now have to allocate and build a full N messages. Where I got into trouble is when my tests wanted to update something larger than a capability (e.g. Text or Data). With an iovec like approach, my "template" becomes more of a prefixed header in the first slice, and the updated value is located in a second slice. Right now I hope my work-around is MultiSegmentArena (but I'll never be able to make it canonical[?] or perhaps it copies it out, need to look at that), but much of it is private so I'm going to see if I can massage the Allocate() method directly and feed it into a capnp.Message. Much of the API isn't public, but I've found I can do something like this:

        // Read the updated Node from the Message.
        var node Node
        if node, err = ReadRootNode(
            &capnp.Message{
                Arena:         capnp.SingleSegment(data),
                TraverseLimit: math.MaxUint64,
                DepthLimit:    2,
            },
        ); err != nil { ... }

araxus commented 1 month ago

Off topic, is there a better place to discuss other issues? I ran into another problem with futures in pipelining. I can do foo(bar(x)), minus being able to tell the server "target=yourself" (not implemented, incurs a +1 network round-trip). However, when I attempt foo(bar(baz(x))), the final RPC is delivered to bar() instead of foo(), thus it returns an unimplemented exception (different RPC interface type-id). I cheekily attempted future.Field(0, nil).Field(0, nil).Interface(), but the second Field(0, nil) wasn't forwarded to bar()'s transform pipeline for that promise; instead it crashed processing both transforms prior to delivery. Is transitive pipeline even possible in the API today?

matheusd commented 1 month ago

Not sure about "better", but the only other place where there are discussions on go-capnp is the matrix channel: https://matrix.to/#/!pLcnVUHHRZrUPscloW:matrix.org?via=matrix.org