fxamacker / cbor

CBOR codec (RFC 8949) with CBOR tags, Go struct tags (toarray, keyasint, omitempty), float64/32/16, big.Int, and fuzz tested billions of execs.
MIT License
748 stars 61 forks source link

Add a method for marshaling directly into a user-provided buffer. #521

Closed benluddy closed 6 months ago

benluddy commented 7 months ago

Description

This is an implementation of the proposal in https://github.com/fxamacker/cbor/issues/520.

PR Was Proposed and Welcomed in Currently Open Issue

Checklist (for code PR only, ignore for docs PR)

Certify the Developer's Certificate of Origin 1.1

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.
benluddy commented 7 months ago

As part of this, I moved the scratch byte array from a pooled buffer field to the stack as-needed.

                                                            │ master.txt  │   user-provided-buffer-inline.txt    │
                                                            │   sec/op    │   sec/op     vs base                 │
Marshal/Go_bool_to_CBOR_bool                                  45.33n ± 0%   45.41n ± 0%  +0.19% (p=0.033 n=10)
Marshal/Go_uint64_to_CBOR_positive_int                        63.97n ± 0%   64.06n ± 0%       ~ (p=0.209 n=10)
Marshal/Go_int64_to_CBOR_negative_int                         50.56n ± 1%   50.61n ± 1%       ~ (p=0.195 n=10)
Marshal/Go_float64_to_CBOR_float                              60.98n ± 0%   61.42n ± 0%  +0.72% (p=0.001 n=10)
Marshal/Go_[]uint8_to_CBOR_bytes                              77.63n ± 0%   74.29n ± 0%  -4.30% (p=0.000 n=10)
Marshal/Go_string_to_CBOR_text                                77.50n ± 1%   76.89n ± 0%  -0.79% (p=0.000 n=10)
Marshal/Go_[]int_to_CBOR_array                                286.9n ± 0%   288.9n ± 0%  +0.71% (p=0.000 n=10)
Marshal/Go_map[string]string_to_CBOR_map                      914.8n ± 1%   876.3n ± 1%  -4.20% (p=0.000 n=10)
Marshal/Go_map[string]interface{}_to_CBOR_map                 2.233µ ± 0%   2.260µ ± 0%  +1.21% (p=0.000 n=10)
Marshal/Go_struct_to_CBOR_map                                 1.391µ ± 1%   1.395µ ± 1%       ~ (p=0.839 n=10)
Marshal/Go_map[int]interface{}_to_CBOR_map                    2.155µ ± 0%   2.188µ ± 0%  +1.51% (p=0.000 n=10)
Marshal/Go_struct_keyasint_to_CBOR_map                        1.382µ ± 1%   1.371µ ± 1%  -0.80% (p=0.000 n=10)
Marshal/Go_[]interface{}_to_CBOR_map                          1.617µ ± 0%   1.671µ ± 0%  +3.37% (p=0.000 n=10)
Marshal/Go_struct_toarray_to_CBOR_array                       1.332µ ± 1%   1.329µ ± 1%       ~ (p=0.401 n=10)

I also tried updating the implementation of Marshal to delegated to MarshalToBuffer, which worked fine, but the benchmarks were marginally worse. I think this may just be the overhead of an extra function call being super visible in benchmark cases that are already very fast, or noise:

Marshal/Go_bool_to_CBOR_bool                                  45.33n ± 0%   46.59n ± 0%  +2.78% (p=0.000 n=10)
Marshal/Go_uint64_to_CBOR_positive_int                        63.97n ± 0%   64.98n ± 0%  +1.56% (p=0.000 n=10)
Marshal/Go_int64_to_CBOR_negative_int                         50.56n ± 1%   53.09n ± 1%  +4.98% (p=0.000 n=10)
Marshal/Go_float64_to_CBOR_float                              60.98n ± 0%   63.23n ± 0%  +3.69% (p=0.000 n=10)
Marshal/Go_[]uint8_to_CBOR_bytes                              77.63n ± 0%   74.98n ± 0%  -3.41% (p=0.000 n=10)
Marshal/Go_string_to_CBOR_text                                77.50n ± 1%   78.20n ± 0%  +0.89% (p=0.000 n=10)
Marshal/Go_[]int_to_CBOR_array                                286.9n ± 0%   286.9n ± 0%       ~ (p=0.364 n=10)
Marshal/Go_map[string]string_to_CBOR_map                      914.8n ± 1%   889.5n ± 1%  -2.77% (p=0.000 n=10)
Marshal/Go_map[string]interface{}_to_CBOR_map                 2.233µ ± 0%   2.204µ ± 1%  -1.30% (p=0.000 n=10)
Marshal/Go_struct_to_CBOR_map                                 1.391µ ± 1%   1.400µ ± 1%       ~ (p=0.108 n=10)
Marshal/Go_map[int]interface{}_to_CBOR_map                    2.155µ ± 0%   2.151µ ± 1%       ~ (p=0.342 n=10)
Marshal/Go_struct_keyasint_to_CBOR_map                        1.382µ ± 1%   1.336µ ± 0%  -3.33% (p=0.000 n=10)
Marshal/Go_[]interface{}_to_CBOR_map                          1.617µ ± 0%   1.601µ ± 0%  -1.02% (p=0.000 n=10)
Marshal/Go_struct_toarray_to_CBOR_array                       1.332µ ± 1%   1.343µ ± 1%  +0.79% (p=0.022 n=10)

I would still rather see Marshal call MarshalToBuffer so that they both benefit from all of the existing test coverage, unless the benchmark diff is a concern.