Open tac0turtle opened 1 year ago
The biggest issue with the current implementation is the truly excessive amount of heap allocation we do. Barely anything is kept on the stack, due to the pointer indirection involved, and amount of new ints allocated.
In place operations help significantly here. Also having more internal flexibility / optimization on the max size bound is likely important.
To compare to something fixed precision, e.g. uint256, you should compare performance for single word sized arithmetic, and 4 word arithmetic. Ideally what we'd see is that for single word in-place operations, the current outperforms theres. (If not, theres a problem) Then we could think of tricks to get these to not be new heap allocations in the average case.
We'd certainly see that 4 word in-place arithmetic in the current approach is slower than theirs. (Somewhat fundamental)
Ideally we'd see generics, like https://github.com/arkworks-rs/algebra/tree/master/ff/src/biginteger be used to allow flexibility between word size, while maintaining high perf. (With the edge case of <= 4 words always outperforming without generics overriding that parameter)
IIRC, Go will allocate objects onto the heap under 3 primary conditions:
Given this, what ideas do you have to ensure we use the stack more effectively?
we should write some benchmarks and identify hot spots in relation to our implementation. This will probably lead to the best first steps
Should get reproducable benchmarks to verify this, but my experience with prior benchmarks is that the is mostly all the heap allocation and GC overhead work. The actually big.Int performance is pretty small compared to the time spent in heap allocating just to create an object. The mutative operations getting used internally is at least a significant aid in this.
Every object here has an address taken, because we use pointers to bigints, and the bigint itself contains a slice underneath. The underlying slice is potentially small vec optimized onto stack: https://medium.com/@yulang.chu/go-stack-or-heap-2-slices-which-keep-in-stack-have-limitation-of-size-b3f3adfd6190 , but the pointer indirection in the Int & Decimal is not. We see this in cosmos math heavy workloads causing massive amount of heap allocs
Theres a few big directions from the heap claim, for speed improvements:
@ValarDragon @tac0turtle okay hear me out: https://github.com/holiman/uint256
It would make sdkmath and sdkcoins speedy fast. No-one needs anything bigger than 256 and we could always create our own custom bigInt that is 2 uint256's if someone REALLY needs something larger than 256.
it is worth considering doing something else since
Another reason to consider this is that math/big is not used for security-sensitive cryptography in Go anymore,
so we might not treat certain bugs in it as security vulnerabilities going forward.
@tac0turtle the context for that go 1.20 announcement AFAIU is:
Might we have related issue: trying to import genesis.json and also validate it and got this error:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x141a438]
goroutine 1 [running]:
math/big.(*Int).Sign(...)
math/big/int.go:41
github.com/cosmos/cosmos-sdk/types.Dec.IsNegative(...)
And
initializing blockchain state from genesis.json
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x141a438]
goroutine 1 [running]:
math/big.(*Int).Sign(...)
math/big/int.go:41
github.com/cosmos/cosmos-sdk/types.Dec.IsNegative(...)
github.com/cosmos/cosmos-sdk@v0.46.15-0.20230807104542-537257060180/types/decimal.go:211
Same with @alpha-omega-labs I've imported genesis.json from Polkachu or https://raw.githubusercontent.com/cosmos/mainnet/master/genesis/genesis.cosmoshub-4.json.gz After that, I got this error:
10:05AM INF starting node with ABCI Tendermint in-process
10:05AM INF service start impl=multiAppConn module=proxy msg={}
10:05AM INF service start connection=query impl=localClient module=abci-client msg={}
10:05AM INF service start connection=snapshot impl=localClient module=abci-client msg={}
10:05AM INF service start connection=mempool impl=localClient module=abci-client msg={}
10:05AM INF service start connection=consensus impl=localClient module=abci-client msg={}
10:05AM INF service start impl=EventBus module=events msg={}
10:05AM INF service start impl=PubSub module=pubsub msg={}
10:05AM INF service start impl=IndexerService module=txindex msg={}
10:05AM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=v13.0.2
10:05AM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1204806]
goroutine 1 [running]:
math/big.(*Int).Sign(...)
math/big/int.go:41
github.com/cosmos/cosmos-sdk/types.Dec.IsNegative(...)
github.com/cosmos/cosmos-sdk@v0.45.16/types/decimal.go:211
github.com/cosmos/cosmos-sdk/x/staking/types.validateValidatorBondFactor({0x1d0b1e0?, 0x0?})
github.com/cosmos/cosmos-sdk@v0.45.16/x/staking/types/params.go:260 +0x46
github.com/cosmos/cosmos-sdk/x/params/types.Subspace.SetParamSet({{0x25f0a90, 0xc0010795c0}, 0xc000139678, {0x25ce740, 0xc000c76cf0}, {0x25ce790, 0xc000c76db0}, {0xc0005b5260, 0x7, 0x20}, ...}, ...)
github.com/cosmos/cosmos-sdk@v0.45.16/x/params/types/subspace.go:245 +0x22d
github.com/cosmos/cosmos-sdk/x/staking/keeper.Keeper.SetParams(...)
github.com/cosmos/cosmos-sdk@v0.45.16/x/staking/keeper/params.go:84
github.com/cosmos/cosmos-sdk/x/staking.InitGenesis({{0x25e2c78, 0xc000054088}, {0x25f0670, 0xc00057e5c0}, {{0x0, 0x0}, {0xc001006000, 0xb}, 0x0, {0x0, ...}, ...}, ...}, ...)
github.com/cosmos/cosmos-sdk@v0.45.16/x/staking/genesis.go:35 +0x265
github.com/cosmos/cosmos-sdk/x/staking.AppModule.InitGenesis({{{0x25f62f8, 0xc0010795c0}}, {{0x25ce740, 0xc000c76ca0}, {0x25f0a90, 0xc0010795c0}, {0x25e66a0, 0xc000cb0480}, {0x7ff1e6bfd200, 0xc0001758c0}, ...}, ...}, ...)
github.com/cosmos/cosmos-sdk@v0.45.16/x/staking/module.go:157 +0x146
github.com/cosmos/cosmos-sdk/types/module.(*Manager).InitGenesis(_, {{0x25e2c78, 0xc000054088}, {0x25f0670, 0xc00057e5c0}, {{0x0, 0x0}, {0xc001006000, 0xb}, 0x4f5b97, ...}, ...}, ...)
github.com/cosmos/cosmos-sdk@v0.45.16/types/module/module.go:327 +0x23d
github.com/cosmos/gaia/v13/app.(*GaiaApp).InitChainer(_, {{0x25e2c78, 0xc000054088}, {0x25f0670, 0xc00057e5c0}, {{0x0, 0x0}, {0xc001006000, 0xb}, 0x4f5b97, ...}, ...}, ...)
github.com/cosmos/gaia/v13/app/app.go:252 +0x20e
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).InitChain(0xc000c7fc00, {{0x0, 0xed5830c36, 0x0}, {0xc001006000, 0xb}, 0xc0005b08e0, {0xc000c53c00, 0x7d, 0x7d}, ...})
github.com/cosmos/cosmos-sdk@v0.45.16/baseapp/abci.go:63 +0x455
github.com/tendermint/tendermint/abci/client.(*localClient).InitChainSync(0xc0005af260, {{0x0, 0xed5830c36, 0x0}, {0xc001006000, 0xb}, 0xc0005b08e0, {0xc000c53c00, 0x7d, 0x7d}, ...})
github.com/tendermint/tendermint@v0.34.27/abci/client/local_client.go:272 +0x118
github.com/tendermint/tendermint/proxy.(*appConnConsensus).InitChainSync(0xc000510f00?, {{0x0, 0xed5830c36, 0x0}, {0xc001006000, 0xb}, 0xc0005b08e0, {0xc000c53c00, 0x7d, 0x7d}, ...})
github.com/tendermint/tendermint@v0.34.27/proxy/app_conn.go:77 +0x55
github.com/tendermint/tendermint/consensus.(*Handshaker).ReplayBlocks(_, {{{0xb, 0x0}, {0xc000cdf348, 0x7}}, {0xc000cdf390, 0xb}, 0x4f5b97, 0x0, {{0x0, ...}, ...}, ...}, ...)
github.com/tendermint/tendermint@v0.34.27/consensus/replay.go:319 +0xf18
github.com/tendermint/tendermint/consensus.(*Handshaker).Handshake(0xc0137f5c70, {0x25f34c0, 0xc0011524e0})
github.com/tendermint/tendermint@v0.34.27/consensus/replay.go:268 +0x3d4
github.com/tendermint/tendermint/node.doHandshake({_, _}, {{{0xb, 0x0}, {0xc000cdf348, 0x7}}, {0xc000cdf390, 0xb}, 0x4f5b97, 0x0, ...}, ...)
github.com/tendermint/tendermint@v0.34.27/node/node.go:329 +0x1b8
github.com/tendermint/tendermint/node.NewNode(0xc0000f0a00, {0x25df960, 0xc000524960}, 0xc000c7b690, {0x25c9860, 0xc000134e40}, 0x0?, 0x0?, 0xc000c7b8b0, {0x25e3a08, ...}, ...)
github.com/tendermint/tendermint@v0.34.27/node/node.go:779 +0x597
github.com/cosmos/cosmos-sdk/server.startInProcess(_, {{0x0, 0x0, 0x0}, {0x25ffc80, 0xc001168630}, {0x0, 0x0}, {0x25e7f58, 0xc0010795c0}, ...}, ...)
github.com/cosmos/cosmos-sdk@v0.45.16/server/start.go:280 +0x89b
github.com/cosmos/cosmos-sdk/server.StartCmd.func2(0xc0010e2c00?, {0x34dff30?, 0x0?, 0x0?})
github.com/cosmos/cosmos-sdk@v0.45.16/server/start.go:128 +0x169
github.com/spf13/cobra.(*Command).execute(0xc0010e2c00, {0x34dff30, 0x0, 0x0})
github.com/spf13/cobra@v1.7.0/command.go:940 +0x862
github.com/spf13/cobra.(*Command).ExecuteC(0xc000e36f00)
github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.7.0/command.go:985
github.com/cosmos/cosmos-sdk/server/cmd.Execute(0x0?, {0xc00013df38, 0x12})
github.com/cosmos/cosmos-sdk@v0.45.16/server/cmd/execute.go:36 +0x1eb
main.main()
github.com/cosmos/gaia/v13/cmd/gaiad/main.go:16 +0x2c
I'm working on Gaia v13.0.2 which downloaded from https://github.com/cosmos/gaia/releases/tag/v13.0.2 (I've tried to clone source and build manually and still not working)
@linhvt-teko did you ever resolve/overcome this issue? I see something very similar
@angusscott I've no idea about this problem. So I reinstalled everything and downloaded a snapshot from Polkachu. You can try these steps:
Summary
in math we use Big.INT for larger than integers. We never truly benchmarked the implementation against something like https://github.com/decred/dcrd/blob/master/math/uint256/README.md or rolling our own implementation.
Proposal
Benchmark big.INT against a library like dcreds, if we identify that big.INT operations are slow we should come up with a plan with how to optimise the operations