Closed tac0turtle closed 6 months ago
I would strongly advocate for a generic ordered map so we don't suffer the performance hit of sorting maps all over the place. We just need to make sure we choose an implementation which is well tested - might make sense to bring something into our source tree even.
I'd personally prefer to not have an added object layer. It adds complexity and makes things less portable.
In my opinion, a better solution would be a linter that complains/fails when range
is used on a map
. With the addition of generics to go
, it's now easy to get a sorted list of keys, which is usually the annoying part when fixing a map loop. I'd much rather keep things as maps and just change range myMap
to range getSortedKeys(myMap)
. You get to keep all the handling/syntax of a map
and only address the problem where it's a problem; you're not changing things to an alternative data structures just because you might want to loop over it.
With the addition of generics to go, it's now easy to get a sorted list of keys, which is usually the annoying part when fixing a map loop. I'd much rather keep things as maps and just change range myMap to range getSortedKeys(myMap).
The big downside is you always have to do sorting at runtime and pay that performance cost. Performance has been a big issue for a while and if we want a sorted data structure, we should just use one.
The big downside is you always have to do sorting at runtime and pay that performance cost. Performance has been a big issue for a while and if we want a sorted data structure, we should just use one.
If you've got a map that is primarily used in loops, then I agree, a sorted data structure is called for. But if you only ever loop on it once or in sporadic instances (where its primary purpose is lookup), just sort the keys when you want to loop.
It'd be annoying if all maps were changed to a non-native data structure just in the off chance that they're needed in a loop.
uhh, I promise from live benchmarks, that sorting in application data OR map insertions is not showing up in any performance problems right now. I really think that its misguided to focus on performance here
The only place that sorting shows as taking forever is in the CacheKV store, which is a different problem where the entire design needs to be fixed.
I don;t really care about the decision of either a generics based BTree, or a Hashmap wrapper that sorts for iteration / has deterministic serialization (cref #12816 ). If the design is done right, this should really not be state machine breaking to change later, as they can have the same serialization/deserialization from proto. (Up to gas parameters, which we can just make the same)
a better solution would be a linter that fails when range is used on a map
It should fail. It already complains, but this apparently isn't loud enough.
keep things as maps and just
change range myMap
torange getSortedKeys(myMap)
This is the best design to me as it's barely a design at all. If we see performance problems with map iteration outside of store/cachekv we can revisit.
The only place that sorting shows as taking forever is in the CacheKV store, which is a different problem where the entire design needs to be fixed.
A recent performance fix https://github.com/cosmos/cosmos-sdk/pull/12885 improved cache sorting significantly, but from what I've seen the current cache impl could be transitioned to a single BTree backed map instead of separating sorted/unsorted. This is smaller scope than the entire rewrite talked about in https://github.com/cosmos/cosmos-sdk/issues/12986. What do you think?
deterministic serialization
Serialization seems orthogonal to the problem at hand, am I missing something? If we always sort keys prior to iterating the form of the serialized data is irrelevant, provided we're OK with the runtime sorting performance hit.
From what I'm gathering and my two cents, it sounds like using generics over ranges is the most straightforward approach. BTrees being a second alternative, but might be overkill for most situations.
@marbar3778
Maps are non-deterministic if handled incorrectly. Cosmos-sdk has a few places where maps could end in non-determinism
The Go map
type is always non-deterministic:
The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next.
https://go.dev/ref/spec#RangeClause
Code that needs deterministic iteration order of key-value associative containers can't use the map
type, at least not directly. Of course, the map
type is built-in and ubiquitous in Go code, so it's not realistic to suggest a project can "replace" it with a different type.
If you want a map with ordered keys, you have basically two options. As suggested, you can use a separate type (like a BTree-based container) locally, and perform conversions to and from Go maps at API boundaries. But this is arduous, error-prone, and the impact of allocations on the GC can easily become significant.
The better option is almost always to use the regular map
type, and when you want deterministic range order, extract and sort the keys, e.g.
func print(m map[string]int) {
keys := make([]string, 0, len(m))
for k := range m { keys = append(keys, k) }
sort.Slice(keys, func(i, j int) bool { return keys[i] < keys[j] })
for _, k := range keys {
v := m[k]
// ...
}
}
Generics can make this easier, for sure.
@aaronc
The big downside is you always have to do sorting at runtime and pay that performance cost. Performance has been a big issue for a while and if we want a sorted data structure, we should just use one.
@ValarDragon is correct in observing that the performance cost of this O(n) range clause is almost certainly statistically zero relative to other performance inefficiencies in Cosmos, and not something worth designing around.
Anther potential source of problem is when we return a map in a query. This must be prohibited as well.
We don't need to prohibit that per se, since we're going with the annotation approach -- queries that are deemed safe for state machine use, will be annotated as such.
I would prefer to make such prohibition for the sake of completeness.
I would prefer to make such prohibition for the sake of completeness.
That's just not practical. It's not going to happen.
So myself and team at Orijtech spent time and we did implement in 2022 a gosec static analyzer https://github.com/cosmos/gosec that would run on all cosmos-sdk code and it helped identify a bunch of map iterations which we sent in various PRs like https://github.com/cosmos/cosmos-sdk/pull/13554 but that got closed without merged.
However the pass was later deemed to be noisy and disabled per https://github.com/cosmos/gosec/commit/bf28a33fadf2652753980200865d63512cde870e
It could be re-enabled if folks see fit or more work done on the pass, but the tools do exist and as we saw it might have been nice to discuss but in reality doesn't seem like a priority that was cared about. Should we be closing this issue otherwise?
Summary
As mentioned in https://github.com/CosmWasm/wasmd/issues/941 non-determinism in maps is a concern for users of cosmwasm and others. There are places the Cosmos-SDK uses maps in many places we should aim to replace this with a deterministic sort with maps or something like a btree.
Problem Definition
Maps are non-deterministic if handled incorrectly. Cosmos-sdk has a few places where maps could end in non-determinism
Proposal
Replace maps with a deterministic ordering data structure or sort for maps where needed.
cc @ValarDragon @ethanfrey