Closed sirdeggen closed 1 year ago
Issues arose when attempting to create and the parse a JSON BUMP in golang, updating in this way would allow us to order the inner array by offset as a convention thus giving us the ability to always keep BUMP format in consistent order which means we could use a checksum to ensure nothing has changed for example.
@wregulski pushing back on this - open to the possibility that we should revert back to offset as key. Thoughts here please so that people can see the arguments in favor of each?
I have both options working here now: https://github.com/libsv/go-bc/pull/73
Whatever concern was originally raised may not actually be a problem.
Comparing the approaches here 1 = slice of maps and 2 = slice of slices
goos: darwin
goarch: amd64
pkg: github.com/libsv/go-bc
cpu: Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
BenchmarkCalculateRootOption1-16 101724 11349 ns/op 6846 B/op 96 allocs/op
BenchmarkCalculateRootOption2-16 121094 9840 ns/op 4328 B/op 93 allocs/op
PASS
ok github.com/libsv/go-bc 2.686s
I made more detailed Benchmarks, and create a comparison of it: benchmark_comparison.txt :
│ BenchmarkBUMPasArrayOfArrays.txt │ BenchmarkBUMPwithOffsetKey.txt │
│ sec/op │ sec/op vs base │
BUMP/from_string-12 1.504µ ± 0% 1.943µ ± 0% +29.19% (p=0.000 n=20)
BUMP/from_json-12 4.215µ ± 0% 4.569µ ± 0% +8.39% (p=0.000 n=20)
BUMP/to_string-12 1.083µ ± 0% 1.846µ ± 0% +70.45% (p=0.000 n=20)
BUMP/to_bytes-12 766.7n ± 0% 1522.0n ± 0% +98.51% (p=0.000 n=20)
BUMP/calculate_merkle_root-12 1.643µ ± 0% 1.917µ ± 0% +16.71% (p=0.000 n=20)
BUMP/receive_hex_bump_(decode_&_merkle_root_calculation)-12 3.225µ ± 0% 3.933µ ± 0% +21.95% (p=0.000 n=20)
BUMP/use_hex_bump_(encode_&_decode_&_calculate_merkle_root)-12 4.360µ ± 0% 5.906µ ± 0% +35.46% (p=0.000 n=20)
geomean 1.985µ 2.723µ +37.17%
If I didn't make any mistake, according to this comparison in all cases BUMP with array of arrays is faster.
In array of arrays there is an assumption that BUMP is already sorted, this means that when building BUMP structure to serialise it later the code that is building that structure is obligated to ensure that leafs are in correct order.
Although array of arrays version seem to be much faster, it have a specific assumption about the order of leafs and there is nothing that ensures that order. Neither in BRC nor in code itself. So if client of BUMP will build the struct with wrong order, serialised version will also be in wrong order while in an offset
version there is always sorting before serialisation.
Because of that, in my opinion we can't rely on current results of benchmark, until code operation will be unified.
Sources can be found on my branch in 4chain fork, in the directory bumpbench
go install golang.org/x/perf/cmd/benchstat@latest
)Beside that, I did small refactoring and I moved files for BUMP implemenations to separated packages. I did this because it makes it easier to create the benchmarks and ensure I didn't make a mistake in it.
Conclusion from today's discussion is that:
BRC ID
74
Discussion
In short, update the BUMP format to include a path as an array of arrays rather than an array of object. Use a key "offset" rather than using the value itself as a key.
Update this:
To this:
Thanks to @shruggr for the feedback and suggestion.
Motivation fleshed out: Using values as keys in JSON can make serialization and deserialization difficult in statically typed programming languages because these languages rely on compile-time type checking. Statically typed languages require that the data types of variables, including keys in a JSON object, be known at compile-time.
When values are used as keys in a JSON object, the key's value can be of any type, which makes it challenging for statically typed languages to determine the appropriate data type of the key during compilation. This ambiguity can result in errors during serialization or deserialization processes.
For example, if a statically typed language expects a specific data type for a key but encounters a value that doesn't match that expected type during deserialization, it may throw an error or fail to properly deserialize the JSON object. Similarly, during serialization, a statically typed language may have trouble determining the type of a key's value, leading to issues in generating the correct JSON output.
To overcome this difficulty, statically typed programming languages typically require keys to be of a specific known data type, such as a string or a numeric value. This ensures that the type of the key can be properly inferred and validated during compilation, making serialization and deserialization more straightforward and reliable.