fxamacker / cbor

CBOR codec (RFC 8949) with CBOR tags, Go struct tags (toarray, keyasint, omitempty), float64/32/16, big.Int, and fuzz tested billions of execs.
MIT License
748 stars 61 forks source link

Treat map keys matching the same struct field as duplicates. #492

Closed benluddy closed 9 months ago

benluddy commented 9 months ago

Description

This resolves the case where two map keys could match the same struct field without being treated as duplicate map keys. The DupMapKey documentation has been updated accordingly.

A map key that matches a previously-matched field will no longer quietly fall back to a case-insensitive match and is now always treated as a duplicate key. Case-sensitive matching of map key to struct field name is now a map lookup to improve the quadratic worst-case:

                                                                   │ manykeys-before.txt │         manykeys-after.txt          │
                                                                   │       sec/op        │   sec/op     vs base                │
UnmarshalMapToStruct/default_options/many_fields_one_key_per_field          24.137µ ± 4%   5.240µ ± 1%  -78.29% (p=0.000 n=10)

                                                                   │ manykeys-before.txt │        manykeys-after.txt         │
                                                                   │        B/op         │   B/op    vs base                 │
UnmarshalMapToStruct/default_options/many_fields_one_key_per_field            112.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=10)

                                                                   │ manykeys-before.txt │         manykeys-after.txt          │
                                                                   │      allocs/op      │ allocs/op   vs base                 │
UnmarshalMapToStruct/default_options/many_fields_one_key_per_field            1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)

This fix cleared the way to optimize the way duplicate keys are tracked, especially for inputs that contain no unknown fields:

                                                                               │  before.txt  │              after.txt              │
                                                                               │    sec/op    │   sec/op     vs base                │
UnmarshalMapToStruct/default_options/all_known_fields                             642.7n ± 3%   735.4n ± 0%  +14.42% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/all_known_duplicate_fields                  1611.5n ± 1%   521.1n ± 1%  -67.66% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/all_unknown_fields                           1.731µ ± 3%   1.174µ ± 0%  -32.18% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/all_unknown_duplicate_fields                 1.726µ ± 2%   1.076µ ± 0%  -37.66% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_known_fields                              638.3n ± 1%   725.5n ± 1%  +13.66% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_known_duplicate_fields                    476.1n ± 1%   514.7n ± 0%   +8.11% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_unknown_fields                            448.1n ± 1%   398.8n ± 1%  -11.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_unknown_duplicate_fields                  447.9n ± 0%   399.9n ± 1%  -10.71% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_known_fields                           1825.5n ± 0%   726.0n ± 0%  -60.23% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_known_duplicate_fields                  744.3n ± 0%   440.8n ± 0%  -40.78% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_unknown_fields                          2.885µ ± 1%   3.329µ ± 0%  +15.37% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_unknown_duplicate_fields                844.8n ± 0%   738.7n ± 1%  -12.56% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_known_fields               1828.5n ± 0%   723.4n ± 0%  -60.44% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_known_duplicate_fields      744.3n ± 0%   436.4n ± 0%  -41.36% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_unknown_fields              643.4n ± 0%   435.3n ± 0%  -32.34% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_unknown_duplicate_fields    642.1n ± 0%   435.6n ± 1%  -32.15% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/many_fields_one_key_per_field                              5.327µ ± 0%
geomean                                                                           936.7n        748.3n       -29.34%

                                                                               │ before.txt  │                after.txt                 │
                                                                               │    B/op     │    B/op      vs base                     │
UnmarshalMapToStruct/default_options/all_known_fields                             16.00 ± 0%     0.00 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/all_known_duplicate_fields                   16.00 ± 0%     0.00 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/all_unknown_fields                           16.00 ± 0%     0.00 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/all_unknown_duplicate_fields                 16.00 ± 0%     0.00 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_known_fields                              16.00 ± 0%     0.00 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_known_duplicate_fields                    24.00 ± 0%     0.00 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_unknown_fields                           24.000 ± 0%    8.000 ± 0%   -66.67% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_unknown_duplicate_fields                 24.000 ± 0%    8.000 ± 0%   -66.67% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_known_fields                            800.0 ± 0%      0.0 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_known_duplicate_fields                 648.00 ± 0%    40.00 ± 0%   -93.83% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_unknown_fields                          800.0 ± 0%   1285.0 ± 0%   +60.62% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_unknown_duplicate_fields                648.0 ± 0%    248.0 ± 0%   -61.73% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_known_fields                800.0 ± 0%      0.0 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_known_duplicate_fields     648.00 ± 0%    40.00 ± 0%   -93.83% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_unknown_fields             616.00 ± 0%    24.00 ± 0%   -96.10% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_unknown_duplicate_fields   616.00 ± 0%    24.00 ± 0%   -96.10% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/many_fields_one_key_per_field                              0.000 ± 0%
geomean                                                                           113.6                     ?                       ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                                                                               │ before.txt │                after.txt                │
                                                                               │ allocs/op  │ allocs/op   vs base                     │
UnmarshalMapToStruct/default_options/all_known_fields                            1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/all_known_duplicate_fields                  1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/all_unknown_fields                          1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/all_unknown_duplicate_fields                1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_known_fields                             1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_known_duplicate_fields                   2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_unknown_fields                           2.000 ± 0%   1.000 ± 0%   -50.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown/all_unknown_duplicate_fields                 2.000 ± 0%   1.000 ± 0%   -50.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_known_fields                           15.00 ± 0%    0.00 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_known_duplicate_fields                 5.000 ± 0%   2.000 ± 0%   -60.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_unknown_fields                         15.00 ± 0%   17.00 ± 0%   +13.33% (p=0.000 n=10)
UnmarshalMapToStruct/reject_duplicate/all_unknown_duplicate_fields               5.000 ± 0%   5.000 ± 0%         ~ (p=1.000 n=10) ¹
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_known_fields               15.00 ± 0%    0.00 ± 0%  -100.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_known_duplicate_fields     5.000 ± 0%   2.000 ± 0%   -60.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_unknown_fields             4.000 ± 0%   2.000 ± 0%   -50.00% (p=0.000 n=10)
UnmarshalMapToStruct/reject_unknown_and_duplicate/all_unknown_duplicate_fields   4.000 ± 0%   2.000 ± 0%   -50.00% (p=0.000 n=10)
UnmarshalMapToStruct/default_options/many_fields_one_key_per_field                            0.000 ± 0%
geomean                                                                          3.043                    ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

PR Was Proposed and Welcomed in Currently Open Issue

Checklist (for code PR only, ignore for docs PR)

Certify the Developer's Certificate of Origin 1.1

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.
fxamacker commented 9 months ago

@benluddy Thanks for opening this PR! The benchmarks you shared look great!

I'll take a look this weekend! :pray: