Open JimLarson opened 2 years ago
I don't think multiset is the right abstraction. It is really a map-like structure constructed with an array. You wouldn't model 1000atoms as 1000 atom instances in a set which is what multiset semantics would imply.
An appropriate data structure for any non trivial operations would be an ordered tree IMHO.
With generics we can probably reuse the same logic for Int and Dec instances.
Multiset semantics don't commit you to a particular representation. A multiset is sometimes written as a set with repeating elements, but it's also written as a non-repeating set where each element has a nonzero numeric multiplicity attached - which is pretty much what current Coins
are, if we can be clear about normalization.
I'm contrasting "multiset semantics" with other interpretations of an array of Coin
values. In the discussion of #11200, several commenters believe that there is or ought to be a behavioral distinction between {{"denom1", 7}, {"denom2", 0}}
and {{"denom1", 7}}
. I think it's possible to build a consistent semantics around this notion - but I can't figure out a use case for it. The current Coins
implementation seems to be 95% aligned with multiset semantics, and this issue is just about settling the remaining 5%.
If we can consistently normalize to sorted, non-duplicated denoms, arrays seem simpler and more performant than trees - both have logarithmic lookup, etc. But as long as we settle the semantics, I truly don't much care about the representation.
I'm curious about whether Go generics will be any easier than simply duplicating the code. In the early days of Go, I talked to Rob Pike about the lack of generics. He had spoken with the Java folks who told him about the immense pain of retrofitting generics onto a deployed language. Rob's takeaway was that generics should therefore be put off as long as possible. I would have thought that the lesson was to build them in from the beginning..
Generics are orthogonal to this point.
Each Coins value should be abstractly thought of as a collection of coins of all possible denominations.
My understanding is that set membership is defined by element identity, not property. For example, a set of integers doesn't contain all possible integers, it contains only those specific integers which are in the set. Right? So it doesn't make sense to me that a Coins set would contain all possible denominations with a default "value" of zero. That's not a set! That's something else. You can of course define Coins, or any type, to have whatever semantics you want, and if this is the most useful model, then so be it. But AFAICT the thing you're describing here is not a set, or a multi-set, so it's confusing to use terminology from that domain.
@peterbourgon please have a closer look a the Wikipedia entry for multiset, which describes that a multiset over a universe U of possible values can be equivalently thought of as:
There's a whole range of possible representations of these concepts in a programming language: arrays, hash maps, etc, and Coins
is certainly one of these possible representations, optimized for having only a few types of denoms in the collection.
For your set-of-integers example, a set is completely defined by its indicator function f, which in this case would mapping from the entire set of integers to the set {0, 1}, where f(n) = 1 means n is in the set and f(n) = 0 means it's not in the set. So the indicator function, which represents the set, is indeed defined on all possible integers, and it returns the default value of zero for any integer that's not in your set. The multiplicity function above is a generalization of an indicator function, where the range is {0, 1, 2, 3, ...}. Coins
directly exposes a multiplicity function - it's called AmountOf()
.
Now you certainly could define a collection of coins that makes a distinction between a zero-quantity entry and a missing entry - but that doesn't seem to be what Coins
implements, and I can't think of a reasonable use case for such a type.
I really am sorry to drag set theory into this - it's just that the behavior of Coins
like collections have been studied for centuries, have standard names, have well-known properties, etc.
Given two sets S1={x:123}
and S2={y:456 z:789}
it seems that Min(S1, S2, x/y/z)
would by the current definition each be 0. Is that right? That's nonintuitive to me, but maybe I'm showing my ignorance in saying so. In any case don't worry about me 👍
I'm definitely not going to overlook your discomfort with this, because you're demonstrating that my explanations need to be better!
Let me try to anchor this in a use case rather than set theory.
Suppose we have a rule at the bank that when you ask for a withdrawal of a given amount, and you don't have sufficient funds in your account, we give you as much money from your account as we can rather than refusing the request. Let's say you have four different accounts, each holding a single currency:
Account | Denom | Balance | Withdrawal | Withdrawal | Comment
Num | | | Request | Result |
--------|---------|-------------|------------|------------|---------------------------
1 | dollars | 100 dollars | 20 dollars | 20 dollars | limited by size of request
2 | euros | 35 euros | 50 euros | 35 euros | limited by account balance
3 | pounds | 0 pounds | 99 pounds | 0 pounds | no balance
4 | francs | 47 francs | 0 francs | 0 francs | no withdrawal
In each case, the withdrawal result is the minimum of the balance and the request. Now what would happen if instead we had all the same money in a single account, and requested all the withdrawals in a single request. Should we get the same aggregate result? You bet we should!
Account balance: 100 dollars, 35 euros, 47 francs
Withdrawal request: 20 dollars, 50 euros, 99 pounds
Withdrawal result: 20 dollars, 35 euros
I contend that:
Does that make sense?
I understand and agree with most of your assertions about this example. But I don't think Min is the name for this operation. And I think it may boil down to this:
Each Coins value should be abstractly thought of as a collection of coins of all possible denominations.
I don't think I agree with this, because an element of a set with value of zero isn't the same as that element not being in the set. You can ask a Coins how many USD it has, and it can respond zero when it doesn't have a USD denom represented within itself, but that's not the same thing. edit: Also, the set of all possible denominations is unbounded and unknowable (in general)!
Min on a Coins is, to me, an operation which produces a union set of all represented denominations among all input sets, where each denomination's value is the lowest among all input values for that denomination.
{A:1 } Min { } = {A:1}
{A:1 B:2} Min {B:3 C:3} = {A:1 B:2 C:3}
{A:1 } Min {B:2 C:3} = {A:1 B:2 C:3}
Since Coins
has no way to say {A:0}
other than by saying {}
(see NewCoins()
, Validate()
), it sounds like you're asking for Coins
values to have no way to say "I have zero of something". This would mean:
...and so on. So the idea that a missing denom in Coins
doesn't mean zero is totally at odds with how the SDK works.
I can imagine a different type that allowed {A:0}
and gave it a different meaning than {}
. However:
Coins
(though they'd both serialize to repeated Coin
);@aaronc The above discussion, along with @robert-zaremba 's comments in #11200, are why I think this issue is independent of the Pulsar work. Pulsar, as I understand it, will let you build many different kinds of data types, but does not decide for you what semantics your data type should have. This can be settled now in the existing Coins
type, then ported to any new world of serialized types.
I'm just arguing for a different name for the operation. Something like e.g. Least
or LeastDenom
or LeastDenomAmong
would be more in line with my expectations. And, ideally, don't make it a method! A free function taking ... Coin
would be even more clear about its semantics. Again, these are all just my suggestions, and I don't really matter that much :) Take them or leave them.
edit: related
I'm very in favor of multiset semantics for all the comparison operators. I can never use the IsAnyGTE
function black box, because I always want the guarantee that every denom is present in the argument. I legitimately think that for comparison functions, the only sensible thing to do is assume not having an element there means it has a value of 0.
For Min, and Max, I think they are better served by intersect
and union
as the names of the functions, purely because of the confusion listed in this issue.
I don't think there should be a function named Min
or Max
due to the ambiguity present atm.
I don't think there should be a function named Min or Max due to the ambiguity present atm.
Agree. But I don't know better and short name.
are better served by
intersect
andunion
as the names
This operations are not intersection nor union.
They are intersection and the union respectively as is commonly defined on multisets? Whats the behavior you'd expect out of intersect / union, thats not being done in @JimLarson 's PR?
Guys, the naming was discussed at length in the PR, and ultimately it got multiple approvals from core developers and was committed. I and others have since been writing code using the chosen name and semantics. This progress doesn't prevent us from revisiting the issue if necessary, but it does raise the bar for further discussion. If you have an objection, I request that it be grounded in an actual use case, and with a proposal for something better.
As I mentioned above, I think the naming objections will subside if we can get some design clarity on Coins
and a consistent implementation. Given this, I'd hope that anyone seeing Min()
would expect it to mean the per-denom minimum, normalized into a valid Coins
value.
I've got a work-in-progress overhaul of coins.go
that is validating this approach. Rather than having each function of two Coins
do its own double-iteration with its own quirks and mistakes, I wrote a single double-iterator which efficiently matches up the denoms and also checks for sorting, duplicates, and non-positives as it goes:
// zip takes two valid Coins values and calls function f with their amounts for each denom,
// except those denoms for which both a and b have a zero amount. The function f will be
// called in increasing denom order. Will stop the iteration if f returns true.
// Will panic if either a or b has bad order or quantity.
func zip(a, b Coins, f func(denom string, i, j Int) bool) { ... }
Every two-Coins
method is more clearly written using zip()
, for instance:
func (coins Coins) Add(coinsB ...Coin) Coins {
sum := Coins{}
zip(coins, coinsB, func(denom string, i, j Int) bool {
sum = append(sum, NewCoin(denom, i.Add(j)))
return false
})
return sum
}
For ordering, I wrote utilities lt()
and lte()
("less than", "less than or equal") using zip()
, then defined everything in terms of them:
func (a Coins) IsAllGT(b Coins) bool { return b.lt(a) }
func (a Coins) IsAllLT(b Coins) bool { return a.lt(b) }
func (a Coins) IsAllGTE(b Coins) bool { return b.lte(a) }
func (a Coins) IsAllLTE(b Coins) bool { return a.lte(b) }
func (a Coins) IsAnyGT(b Coins) bool { return !a.lte(b) }
func (a Coins) IsAnyLT(b Coins) bool { return !b.lte(a) }
func (a Coins) IsAnyGTE(b Coins) bool { return !a.lt(b) }
func (a Coins) IsAnyLTE(b Coins) bool { return !b.lt(a) }
This changes the current behavior for the IsAny..()
methods, but I'm pretty sure that every current use of the IsAny...()
is buggy. At the very least, it looks like a footgun.
I've got some other urgent priorities right now, but I'll try to get back to this soon. In the meantime, I'll probably submit a tiny change to keep IsEqual()
from panicking on valid inputs. I've heard no use case for that, and every senior developer I've asked has done a facepalm when I describe the current behavior.
Whats the behavior you'd expect out of intersect / union, thats not being done in @JimLarson 's PR?
The Min and Max have a very good doc string. Min
indeed is an intersection, but Max
is not a Union
.
{2A, 3B}.Max({1B, 4C}) == {2A, 3B, 4C}
{2A, 3B}.Union({1B, 4C}) == {2A, 4B, 4C}
What you described as Max
is actually the definition of union for multisets. What you describe as Union
is called the sum. Check out the multiset article.
I agree that the definition of union is surprising, which is another reason I prefer Min
/Max
names instead. In the zip-based reimplementation, these are very clearly the per-denom min/max, just as the Coins
-level Add()
above is very clearly the per-denom sum.
Summary
Cleans up potentially dangerous inconsistencies in
Coins
andDecCoins
by establishing a multiset model for semantics and a normalization policy.Problem Definition
Coins
(andDecCoins
) are sometimes vague and surprising in their semantics.IsEqual()
returns false when comparingCoins
of different lengths, but panics when it encountersCoins
of the same length but having a difference in denoms. This appears to be intentional, as there are unit tests for the behavior. There doesn't seem to be a sensible use case for these semantics.The call
a.IsAnyGT(b)
sounds like it ought to be equivalent to!b.IsAllLTE(a)
, but it's something quite different, and deliberately so. For instance, the doc comment says{2A, 3B}.IsAnyGT{5C} = false
, despite the fact that the 2 of denomA
is greater than the zero amount ofA
in{5C}
. Similarly forIsAnyGTE()
.There is inconsistency in canonicalization of
Coins
values.Negative quantities are forbidden by
Validate()
but returned bySafeSub()
.Add()
promises not to return any negative quantities but has no checks to enforce this.Some methods explicitly handle non-valid values, others panic or return wrong results. For the empty
Coins
value,NewCoins()
uses the empty slice, butAdd()
andSub()
usenil
. Most uses are happy with either, but some tests require one specifically.There's much hand-wringing about the existence of zero-quantity
Coin
values but the non-existence of zero quanties in the representation ofCoins
values. There's question of the validity of an emptyCoins
value. See #7046 and its references.This all seems to stem from confusion of the implementation of
Coins
with the semantics it provides, the lack of clear definition of those semantics, and the absence of a policy of normalization.Proposal
Multiset model
Each
Coins
value should be abstractly thought of as a collection of coins of all possible denominations. This makesCoins
a multiset, which suggests some operations and their proper behavior.A multiset has an obvious partial order based on containment:
A <= B
if and only if for all denomsD
,A.AmountOf(D) <= B.AmountOf(D)
. Note that this is a partial order, so it's possible for neitherA <= B
norB <= A
.The partial order gives rise to well-known mathematical structure. See #10995 which adds
Min()
andMax()
operations, which are indispensible in various vesting account calculations.If the set of denoms was stable, the SDK could have defined
Coins
as a fixed-length vector ofInt
, with each denom getting a well-known offset. However, since we expect a large and dynamic set of denoms, with mostCoins
values having only a few of these nonzero, then the current "sparse" representation makes more sense. But the semantics should be the same for both.This interpretation settles the hand-wringing over the possibility of zero-quantity denoms in
Coin
but not inCoins
. It confuses the sparse implementation ofCoins
with the data it provides: you see the zero if you ask withAmountOf()
.Normalization policy
The Coins representation as an array of
(denom, amount)
pairs makes sense, and allows direct representation in proto-derived structs. But it also allows representation of some values that do not match up with the abstract model ofCoins
:When the representation is created with the
NewCoins()
constructor, all of these are checked for, but protos generate the representation directly, as do some tests and an unknown amount of user code.Nothing can be done about invalid denoms or negative quantities.
Sort()
will handle order, and a trip throughNewCoins()
will handle both order and zero quantities. Nothing fixes duplicates at the moment, but we could potentially consolidate them by adding the quantities together. It's not clear whether reforming such content would fix bugs or merely obscure them.Methods vary in their support for invalid representations.
IsZero()
seems to exist (vsEmpty()
) only for zero quantities.IsEqual()
sorts its inputs just to be sure, whileAmountOf()
silently relies on its inputs being ordered. Application code can't tell what to expect without reading each implementation. This is a mess.The "highbrow" approach to regularizing this would be to create a
ValidCoins
type with all of the operations, leavingCoins
as the type of the proto field, having only a methodCoins.Validate() (ValidCoins, err)
to canonicalize the coins.A more middle-brow approach would be to simply let most methods assume valid input, but require all to produce valid output. Application logic should explcitly normalize values coming from "outside", e.g. transactions, but can consider the store to only hold valid values.
DecCoin
DecCoins
is mostly a clone ofCoins
withDec
instead ofInt
for quantities, but similarly requires positive quantities for valid values. Thanks to Golang's impoverished type system, we need to duplicate the implementation.All of the above concerns and suggestions seem to apply.
Specific Changes
Let me sum this all up in the following specific proposals.
Fix
IsEqual()
to not panic. Though technically an incompatible change, it's hard to imagine anyone relying on the current behavior.Rewrite
IsAnyGT{E}()
in terms of!IsAllLT{E}()
and change to multiset semantics. Again, an incompatible change, but I think it's likely that the usages actually expect the multiset behavior.Deprecate or remove
DenomsSubsetOf()
since the very question doesn't make sense in multiset semantics.SafeSub
should return empty coins ifhasNeg
is true, to prevent negative-quantityCoins
from leaking out. A quick review of the code suggests that the returned coins value is never used if thehasNeg
field is set. (Except for one case in a test, can probably be rewritten.)All methods should handle
nil
and empty slice the same way. E.g. always test withlen(coins) == 0
instead ofcoins == nil
. Code (including tests) should not try to distinguish between the two.Nevertheless, pick
nil
as the canonical representation of the emptyCoins
value since it's the "zero value" dictated by the Go language.Application code and tests should not create coins as arrays but should instead use NewCoins.
Expect validation / canoncicalization at the boundary (via
NewCoins(dirtyCoinArray...)
or an equivalent alias) and don't do checks or conversions in other methods. Possibly have a transitional period where all methods check for invalid input and panic.Equivalent changes in
DecCoins
.For Admin Use