kardianos commented 5 years ago

Background

Databases store many types of values. One type of very common value databases store, especially SQL databases, are decimal values. Some databases limit the size of the decimal, while others support arbitrary precision. But common to all is a base10 representation of the decimal number.

Historically handling decimals for database/sql and database drivers has been a pain. What types should a driver look for? If a driver looks for a given type and handles it, then it has to import it and depend on it. Or possibly go to the trouble of injecting the various types it handles with some type of type registry. The solution space at present for "library" packages that need to deal with decimals, but may be used by many other applications, is sub-optimal.

There is a history of proposals for including a decimal type into the standard library:

19787 #12127 #12332 .

Lastly, there are a number of decimal packages. Each implementation has a similar type implementation:

Generally each decimal type has a big.Int coefficient and an int32 exponent. All of these types of decimals are arbitrary precision, not fixed size like decimal128.

Proposal

I propose that a common interface is defined for a decimal type that dumps the big.Int and int32 exponent. This interface could be defined in the standard library, or agreed upon by each package. This way a SQL driver would need to do a type assertion for the well known interface, dump the value and exponent, and marshal the value to the driver. When scanning a decimal value, it would try to load the value and exponent into the decimal type presented to it.

It may be necessary to also provide some test vectors along with the implementation so the value and exponent are interpreted the same.

The proposed interface:

// Decimal composes or decomposes a decimal value to and from individual parts.
// There are four separate parts: a boolean negative flag, a form byte with three possible states
// (finite=0, infinite=1, NaN=2),  a base-2 little-endian integer
// coefficient (also known as a significand) as a []byte, and an int32 exponent.
// These are composed into a final value as "decimal = (neg) (form=finite) coefficient * 10 ^ exponent".
// A zero length coefficient is a zero value.
// If the form is not finite the coefficient and scale should be ignored.
// The negative parameter may be set to true for any form, although implementations are not required
// to respect the negative parameter in the non-finite form.
//
// Implementations may choose to signal a negative zero or negative NaN, but implementations
// that do not support these may also ignore the negative zero or negative NaN without error.
// If an implementation does not support Infinity it may be converted into a NaN without error.
// If a value is set that is larger then what is supported by an implementation is attempted to
// be set, an error must be returned.
// Implementations must return an error if a NaN or Infinity is attempted to be set while neither
// are supported.
type Decimal interface {
    // Decompose returns the internal decimal state into parts.
    // If the provided buf has sufficient capacity, buf may be returned as the coefficient with
    // the value set and length set as appropriate.
    Decompose(buf []byte) (form byte, negative bool, coefficient []byte, exponent int32)

    // Compose sets the internal decimal value from parts. If the value cannot be
    // represented then an error should be returned.
    Compose(form byte, negative bool, coefficient []byte, exponent int32) error
}

This is the original proposed interface: out-dated, see this comment or see above.

// Form of the number. If this lived in "math/big" it could be typed as "type Form byte".
const (
    Finite byte = iota // Normal numbers
    NaN
    InfiniteNegative
        InfinitePositive
)

// DecimalValue is an interface decimal implementations may implement to
// allow easy conversion between various decimal packages. These interfaces
// may also used by packages that need to marshal or unmarshal a decimal value,
// but don't care about the rest of the decimal type.
//
// The decimal value in the finite form is defined as "decimal = coefficient * 10^exponent".
type DecimalValue interface {
    // ValueExponent returns the coefficient, exponent, and form of the decimal value.
    ValueExponent() (coefficient big.Int, exponent int32, form byte)

    // SetValueExponent sets the decimal number's value, exponent, and form.
    // If the decimal number does not support the value form (such as Infinite),
    // or if the value is larger then a fixed size representation can handle, an error should be returned.
    // However, if a decimal supports NaN but not Infinite forms, the Infinite forms
    // may be coerced into a NaN.
    SetValueExponent(coefficient big.Int, exponent int32, form byte) error
}

/cc @mjibson @victorquinn @ericlagergren

If this, or a similar, interface is acceptable to cockroachdb, shopspring, and Eric, I'd be willing to open up PRs to add the methods and some test vectors.

Edited proposed interface to include a "form".

ericlagergren commented 5 years ago

I think decimal = mant * 10^exp should be a part of the API.

I really like this idea.

I’ll have more feedback later.

maddyblue commented 5 years ago

The immediate problem I see with this is the non-numeric decimal representations. apd and ericlagergren's decimal type support positive and negative infinity, nan, and signaling nan. The shopspring decimal and many others do only mantissa + exponent. The special values are handled well with floats because they are defined as part of the bits. I think that if we want adoption of this interface it would need to support more than finite numbers.

renthraysk commented 5 years ago

Just to clarify... On querying, driver.Next() has to pass back an implementation of DecimalValue, and then database/sql will attempt to convert/copy that into whatever type was passed into sql.Rows.Scan()?

kardianos commented 5 years ago

@renthraysk Yes.

It would first attempt to match the type (like apd.Decimal -> apd.Decimal) but failing that, if both the src and dest satisfy the DecimalValue type, it could extract the bits from one and stuff it into the other.

renthraysk commented 5 years ago

Leaves NULL handing. Currently drivers return NIL, and database/sql calls dest type's Scan() func for setting Valid field. So something like

type NullDecimal struct { Valid bool Value DecimalValue } func (nd *NullDecimal) Scan(src []interface{}) error { ... }

?

kardianos commented 5 years ago

@ericlagergren If you have time this week, additional feedback would be appreciated.

@mjibson What do you think of the edited interface? I modified the interface to take a "form", as well as the the set method returning a method. This way if a decimal implementation and the number was too large, or the implementation doesn't support NaN or infinity and the number was in that form, it could return an error.

maddyblue commented 5 years ago

I recommend removing pos and neg infinity and just having infinity. The sign of the infinity can be extracted from the sign of the big.Int value. In addition, I recommend adding sNaN, which is the signaling nan that is supported by most decimal implementations (see https://docs.python.org/2/library/decimal.html#decimal-objects for example).

Separating the sign from the form is important because in addition to +/- infinity, the spec also supports +/- NaN and +/- sNaN. See http://speleotrove.com/decimal/damodel.html, which discusses the requirements of finite and special values, and is linked to by the python decimal module.

Although there is a very good argument to be made that the forms you have as of now (pos inf, neg inf, nan, finite) are sufficient since the sign and type of nan almost never matters, and the java bigdecimal implementation only supports those forms.

apd supports 4 kinds of nan, but I suspect most users only use the normal nan. In our use of apd, we certainly only use the normal nan and coerce all other nans to it.

kardianos commented 5 years ago

@mjibson Thanks for the feedback. I appreciate the links. I finished reading the speletrove link and skimmed the python doc link.

My point of reference is mostly line of business applications. The implementations I'm looking at are C#, Java, Oracle, PostgreSQL, and MS SQL Server.

C# uses fixed width decimal128, no NaN or Inf forms. ref
Java BigDecimal supports arbitrary precision, but no NaN or Inf forms from what I see. ref
PostgreSQL supports arbitrary precision and NaN form, but not Inf form. ref
Oracle (database) uses fixed width and supports NaN and Inf forms. ref
SQL Server uses fixed width and does not store NaN or Inf forms. ref

Is there a different set of use cases I'm not thinking of where transmitting the signed NaN, or various forms of NaN would be advantageous?

maddyblue commented 5 years ago

Honestly no, I'm not aware of anyone caring about any kind of NaN other than just the normal NaN. I guess in that light I think your current proposal is fine. I almost think the previous one with posinf, neginf, nan, finite may be better just because it feels medium risky to tell people to inspect the sign of the big.Int. Unsure.

ianlancetaylor commented 5 years ago

CC @griesemer

kardianos commented 5 years ago

@mjibson When I wrote some pseudo code to unmarshal and marshal the infinity forms, it was clearer to keep Positive and Negative Infinity as separate forms. I reverted to having them separate.

ericlagergren commented 5 years ago

Overall, I like the idea.

I'm kind of torn between including different types of NaN and infinity values, though.

I don't see much value in inserting qNaN into a database. So, I like the idea of having three simple form constants: Finite, Inf, and NaN.

That being said, if people do need those values it'll be difficult to add them on top of the aforementioned iota-based form constants.

So, a bitmask seems like a decent approach:

type Form byte
const (
    Finite Form = 0
    Inf    Form = 1 << 2 // ±Inf
    NaN    Form = 1 << 3 // qNaN or sNaN

    // sign can be used to create -infinity:
    //    NegInf = sign | Infinity
    // as well as signed NaN values
    //    NegNaN = sign | NaN
    // it can also be used to represent -0
    // if coefficient == 0 and Form == sign
    sign Form = 1 << 1
)

Something else not considered is -0. Though there are not a lot of practical uses, it is important. The current proposal doesn't allow for -0 because big.Int does not support it.

ericlagergren commented 5 years ago

Additionally, I know decimals are well-defined, but I would take the extra step to add decimal = coefficient * 10^exponent to the API.

The reason for this is some libraries like my decimal.Big and like Java's BigDecimal expose the exponent as a "scale" value that, if positive, is equal to the number of digits to the right of the decimal point. In effect, scale = -exponent. (FWIW: internally, the libraries work on the exponent like normal.)

If the equation is defined in the DecimalValue API (much like it is in big.Float's API) then it's less likely to confuse folks.

griesemer commented 5 years ago

math/big doesn't support NaN, so there's no point in having it in such an API.

More generally, conversion from/to decimals requires parsing/printing the decimal numbers; in other words these are string conversions. The conversions in each of these directions is a handful of lines of code. I am not convinced that these need to be added to the big.Float API. See here for an example implementation:

https://play.golang.org/p/iJpjp9J11yy

The code is easily adjusted to provide big.Ints if that is desired. The string ops may be not totally optimal but negligible (in terms of overhead) compared to the actual Float conversions (which are expensive, and there is no way around those).

If the decimals are in []byte rather than in string form, then one can use Un/MarshalText instead.

In short, it seems to me that it's better to write the few lines of code that are custom-fitted for the specific application in mind than adding more API that may not be quite right in the specific case.

renthraysk commented 5 years ago

This is about getting decimals (including NaNs) in and out of database/sql and drivers without having to depend on any specific implementation.

griesemer commented 5 years ago

@renthraysk NaNs are not "decimals". There's no support for NaNs in math/big by design (and we're not going to add support for NaNs); so let's leave them out of the picture. If you need to represent NaNs, you can always have a special encoding outside of math/big.

I don't know what you mean by "without relying on a specific implementation". The title of this issue is pretty clear, the specific implementation is math/big. Please clarify.

ericlagergren commented 5 years ago

NaNs are not "decimals"

I hope I'm not being too pedantic or misunderstanding your point, but NaN values are decimals in that they're special values explicitly covered by decimal floating point specs.

kardianos commented 5 years ago

@griesemer Thank you for looking at this issue. I mostly focus on database/sql and database drivers. This issue pertains specifically to (base-10) decimal types. It is most important not to end applications, but intermediate libraries, notably database drivers.

I would be nice to have a base-10 decimal type in the standard library, but baring that, it would be nice to have an interface for marshaling and unmarshaling decimal types in and out of non-standard types. This interface could be defined in "math/big", or it could be implicitly defined in decimal packages as we mutually agree to it. So while this could involve "math/big", the interface could also be implicit.

Example 1:

User wants to put a decimal value into a database. Say they are using cockroach/adb in the application.
They pass this decimal value to db.QueryContext("insert into mymoney (?);", decimalValue).
database/sql passes this into the database driver and it needs to handle that decimalValue. What should it do? a. Require all applications to pass in decimals as strings? b. Require a single decimal type to convert to before passing in? c. Require any decimal type to support a specific interface that can be efficiently read and handled?

Example 2:

User wants to read a decimal value from a database. Say they are using shopspring/decimal in the application.
They create a decimal type, then scan from row row.Scan(&decimal).
database/sql looks at decimal and doesn't know what to do with it, so it asks the driver if it can handle it. The driver looks at the value and might do one of the following: a. Require all decimals to be pulled out of the database as strings? b. Require a single decimal type to convert from before scanning out? c. Require any decimal type to support a specific interface that can be efficiently set and handled.

Let's ignore NaN at the moment. Regardless, as a database driver, telling clients to always go through strings or a []byte interface isn't really an option I see as viable.

I hope this clears up what I'm trying to accomplish with this issue.

griesemer commented 5 years ago

@ericlagergren Point taken, but again, they are not supported by math/big, by design (math/big is a software library, not a hardware standard, and as such has many other ways to express errors such as a division by zero w/o resorting to special "in-band" values such as NaNs).

@kardianos Thanks for the clarification. I suggest you change the title of this issue to something that better reflects the intent. From you response I gather that this issue is only marginally related to math/big.

renthraysk commented 5 years ago

@griesemer The implementations I was referring to was the decimal packages listed in the original post.

griesemer commented 5 years ago

@kardianos Thanks for the title change, this makes is much clearer (to me, at least). I'm somewhat amazed that all the Decimal packages are using a *big.Int plus exponent representation. Wouldn't it be better to not be so dependent on that? For instance, borrowing freely from your suggestion:

type Decimal interface {
   // Value returns the sign, mantissa, and scale factor of the decimal value x, such that
   // x = sign * mant * 10^exp, where a sign < 0 indicates a negative value (otherwise it
   // is a positive value), mant contains the mantissa bits in little-endian binary format,
   // and scale is the scale factor by which the mantissa is adjusted. The value 0 is
   // represented by an empty mantissa (len(mant) == 0) and a scale of zero, while an
   // infinity uses an empty mantissa with any non-zero scale value. If a sufficiently
   // large buf is provided, its underlying array is used for mant.
   Value(buf []byte) (sign int, mant []byte, scale int)

   // ...
   SetValue(sign int, mant []byte, scale int) error
}

With an interface like this one wouldn't be so tied to math/big, yet it would still be reasonably efficient to convert to math/big.Int since big.Int's have methods to convert from/to bytes.

Some questions to answer:

should it be sign int or neg bool ?
is the mantissa normalized in any way? (This is also a question for your design.)

Also, how important is it to be able to represent NaNs?

kardianos commented 5 years ago

@griesemer Thank you for the continued feedback. It would be nice to not be coupled to big.Int, though it does come at a cost of being a slightly more complicated definition. I would need to implement your proposed interface on an existing package or two before I have answers to your questions.

It is (unfortunately) somewhat important to support NaNs. Postgresql and Oracle both support storing a NaN decimal state.
One nit on the code above is I would explicitly state int32 for scale. Did you put in int scale on purpose?
In your suggestion above, you have []byte be a little-endian number. In big.Int API, Bytes returns a big-endian number. Any particular rational for that difference?
I don't think I understand what a normalized mantissa means in this context. But I don't think the mantissa and scale need to be defined so that for a given number they will always be represented the same way.
You used "mantissa", but other packages, people have used coefficient. Is there a technical difference I should be aware of between the two terms? Do you view one to be more appropriate?

griesemer commented 5 years ago

@kardianos Please find my replies below, in the order of your questions:

If NaN is important, I'm sure it could be encoded somehow as well. I'd be somewhat hesitant to use an additional form argument as it looks like it is borrowed from math/big.Float's implementation, and it's really an implementation detail. My question here would be: How do DB's encode NaNs? What about Infs?
~I used scale to differentiate from an exponent (which usually has a specific meaning with respect to a normalized mantissa). But I have no strong feelings here. exp is probably fine.~ [edited - I misread your question here] Yes, I chose int over int32 deliberately. Exposing int32 seems like an implementation detail that will make it harder for the API to interact with other code. We find that even the choice of uint vs int can make code more cumbersome to use (e.g., in retrospect, I'd not use uint in the math/big API, just int). Using int rather than the int32 may make a concrete implementation a bit harder but likely will simplify external code. I'd be curious as to your choice of int32.
big.Int.Bytes should have returned a little-endian representation; but that was a battle I lost when that API was designed. After all, the internal representation is little-endian. I don't think it matters much.
A mantissa is usually the fractional part of a floating-point number, and it's normalized, meaning that it is within a certain range. E.g., for IEEE floats, the mantissa is in the range [0.0, 1.0[ (with the 1.0 excluded), and the actual value scaled by the exponent is 1.mantissa (the leading 1 bit is implicit), unless the value is 0.0. I don't know that it matters that the mantissa is normalized here, but it seems a bit odd to me that one might get an arbitrary (mant, exp) pair for a given value. For instance, the value 123 might be represented as (mant, exp) pair (123, 0), or (1.23, 2), or (0.123, 3) (and others). In this case, the latter two might be considered normalized. I think one would want to have some guarantee that the mantissa is "minimal" (as in short). Perhaps not for the input (SetValue), but for the output. I suspect we don't want 123 to be represented as (1230000000000, -10)
Neither mantissa nor coefficient are good names here in my mind. But mantissa seems slightly better, especially if it is normalized. The word coefficient really doesn't imply anything (coefficient of what?). At least a mantissa has some meaning that's related.

ericlagergren commented 5 years ago

@griesemer

I haven't fully digested your last two comments yet, but I'm curious what you meant by

I'm somewhat amazed that all the Decimal packages are using a *big.Int plus exponent representation.

maddyblue commented 5 years ago

Decimals (or at least the decimal spec previously mentioned) have the ability to differentiate between representations like 123, 123.0, 123.00, so I think normalizing the mantissa is not desired.

griesemer commented 5 years ago

@ericlagergren What I meant is literally that I am surprised that given six essentially independent packages (I assume that's the case) all chose a very similar implementation. I would have expected that some of them would use a custom, fixed-size representation.

ericlagergren commented 5 years ago

@mjibson re: normalization: http://speleotrove.com/decimal/decifaq4.html It's mentioned elsewhere, too.

@griesemer ah. Just curious. I hope to have BCD at some point. And fixed size. But that's off topic for now :)

griesemer commented 5 years ago

@mjibson Thanks for the clarification. So 123.00 would presumably be internally represented as the (value, scale) pair (12300, -2), 123.0 would be (1230, -1), and 123 would be (123, 0)? What about 12300.? Is that (12300, 0) or (123, 2)? How does this match the proposed original interface?

ericlagergren commented 5 years ago

@griesemer 12300. could be either. They'd print differently, though. :)

Do note that, in some APIs—like BigDecimal—"scale" is not the same thing as "exponent."

I do not understand how this would not match the proposed original interface.

I'd be curious as to your choice of int32.

int32 is used instead of int because (1) it's highly unlikely an exponent greater than 1<<31-1 (or lesser than the complement) will ever be needed given the current physical limits of machines, and (2) it simplifies internal calculations because you can avoid overflows by performing 64-bit arithmetic.

If NaN is important, I'm sure it could be encoded somehow as well.

NaN isn't that hard by itself, it's the different types of NaNs that make it challenging. Like @mjibson mentioned earlier, there are two types of NaNs, signaling and quiet. And both types can be signed. And there's also NaN payloads.

griesemer commented 5 years ago

@ericlagergren Thanks. I was asking about 12300 and its representation because printing does matter, doesn't it? It's also clear that scale is not the same as exponent, so this would have some bearing on the API, wouldn't it?

Regarding the choice of int32: Your answer is exactly what I expected. I agree that it makes no sense to have exponents > 32bits. In fact I'd argue it makes no sense to have exponents > than a much smaller maximum (say, 16 bits). I would not "pretend" that exponents can be as large as 32bits by enshrining this value in the API, and instead check dynamically that exponents are within a fixed range that fits well within 32 bits. It will make the implementation slightly more complicated but simplify clients because they can just deal with ints. (math/big internally uses 32bit exponents but doesn't expose that for this reason.)

If NaN's are so pervasive in this (DB) world, it seems better to me to have them explicitly taken care of in the interface, especially if they can carry a payload. Yes, one would have to check for NaNs but at least that would make it very clear what is happening.

ericlagergren commented 5 years ago

It's also clear that scale is not the same as exponent, so this would have some bearing on the API, wouldn't it?

I know it's different than how it's used elsewhere in CS, but scale = -exponent. I mentioned this in a previous comment:

... some libraries like my decimal.Big and like Java's BigDecimal expose the exponent as a "scale" value that, if positive, is equal to the number of digits to the right of the decimal point. In effect, scale = -exponent. (FWIW: internally, the libraries work on the exponent like normal.)

I'm not really tracking why (12300, 0) and (123, 2) matter so long as the API can accurately represent both, and AFAIK all example APIs so far (yours, so long as there's no normalization, and @kardianos').

I would not "pretend" that exponents can be as large as 32bits by enshrining this value in the API, and instead check dynamically that exponents are within a fixed range that fits well within 32 bits.

For more context: decimal libraries are allowed to set their own exponent ranges. Theoretically, I could allow any 32-bit integer in my library, while cockroachdb/apd could restrict theirs to [-1000, 1000]. Generally, the GDA spec recommends the exponent range be symmetric and no less than 10 times the maximum length (in digits) of the mantissa, with a hard floor of 5 times the maximum length iff there's a maximum length. More info: http://speleotrove.com/decimal/damodel.html

I don't think this API should be more strict than the GDA spec.

maddyblue commented 5 years ago

The numbers would be printed differently: (12300, 0) = 12300, (123, 2) = 123e2, so there's no ambiguity.

griesemer commented 5 years ago

@ericlagergren, @mjibson : I think my coin has finally dropped; apologies for being slow, I think normalization has side-tracked me. So the value, in decimal form, are the digits (excluding an exponent) that one would always see in printed form: if the value is 12300, these are the digits that will show up, possibly with a decimal point somewhere. The scale (or exponent) simply positions that decimal point (and may or may not show up as exponent, possibly depending on formatting options).

Going back to exponent ranges: I agree that they should be symmetrical; and that there should be a reasonable upper limit. FWIW, it's much simpler to write a correct implementation that has a spelled out upper limit (well below 32 bits) than trying to make the code correct for a full 32bit exponent (which is not symmetric anyway) in all cases.

Going back to the proposed interface and naming: How about significand instead of coefficient or mantissa? It more clearly explains what it represents, namely the significant digits of the value. And perhaps scale instead of exponent.

And finally, going back to the proposed interface: The initial comment mentions that typically SQL databases have a base10 representation for decimal numbers. But the proposed interface uses a binary representation. Who is doing the conversion?

kardianos commented 5 years ago

And finally, going back to the proposed interface: The initial comment mentions that typically SQL databases have a base10 representation for decimal numbers. But the proposed interface uses a binary representation. Who is doing the conversion?

I'm not sure what you man by binary representation. If you mean the driver still needs to take the significand []byte, scale, and sign and put it together or take it apart, then I would imagine database drivers would either:

Make a small internal package to do this with just enough bits to marshal/unmarshal to the wire.
Or import a decimal package that supports the interface, and marshal it into that, then on/off the wire.

Or perhaps taking the question a different direction, once a consensus is reached on the interface I and/or others can open up PRs on the most popular decimal packages so the decimal types implement this interface.

griesemer commented 5 years ago

@kardianos Sorry for being unclear. https://github.com/golang/go/issues/30870#issue-421630412 said: "But common to all is a base10 representation of the decimal number." But this interface is using a base2 representation. So I assume the DB driver does whatever conversion necessary?

I'm asking all these detailed questions because I want to understand how this would be used in practice and why it should be in the std library in the first place.

My understanding is that this interface would decouple different DBs from apps using different decimal number packages so that one doesn't need n*m different connections (given n DBs and m different decimal number packages), only n+m connections: from n DBs to 1 interface, and from that 1 interface to m different decimal packages.

kardianos commented 5 years ago

My understanding is that this interface would decouple different DBs from apps using different decimal number packages so that one doesn't need n*m different connections (given n DBs and m different decimal number packages), only n+m connections: from n DBs to 1 interface, and from that 1 interface to m different decimal packages.

This de-coupling is indeed the benefit.

: "But common to all is a base10 representation of the decimal number." But this interface is using a base2 representation. So I assume the DB driver does whatever conversion necessary?

I'm sure I'm missing something. My understanding is we can decompose a base-10 decimal concept into the following components (ignoring Inf and NaN):

Sign (bool or int)
A (big) integer number (the []byte Significand)
Exponent (int)

These parts would be composed as follows: base-10 decimal = sign * significand * 10 ^ exponent. Is this not the case?

But to answer your question, yes, the DB Driver will do whatever is needed to convert between the database wire protocol representation to this decomposed representation, where this interface can set or retrieve the value. DB Drivers can support "custom" data types, so we could omit support from database/sql for the time being even and let the database drivers do the lifting.

ericlagergren commented 5 years ago

... The initial comment mentions that typically SQL databases have a base10 representation for decimal numbers. But the proposed interface uses a binary representation. Who is doing the conversion?

Using my package as an example, I would implement ValueExponent as follows:

func (x *Big) ValueExponent() (sign bool, significand *big.Int, exponent int32) {
    significand = new(big.Int)
    if x.isCompact() {
        significand.SetUint64(x.compact)
    } else {
        significand.Set(&x.unscaled)
    }
    return x.form&signbit, significand, x.exp
}

If the DB requires a string (e.g., lib/psql), the SQL driver could do something like

sign, sig, exp := v.ValueExponent()
s := sig.String()
s = s[:exp] + "." + s[exp:]
if sign {
    s = "-" + s
}
// send s to the DB

(NB: the code examples are off-the-top-of-my-head example and inefficient, they miss corner cases, etc.)

Similarly so with whatever binary encoding the DB uses.

However, it's not very ideal for SQL drivers to do the string formatting themselves.

renthraysk commented 5 years ago

MySQL X's wire format of decimals is a byte for the scale followed by packed BCD for the significand, and a nibble for the sign with optional zero nibble for padding. So not sure how common base10 is...

@kardianos Talking of 1 and small enough. database/sql requires both the ValueExponent() and SetValueExponent(). However a driver only needs the ValueExponent().

The simplest implementation would be something like...

type decimal []byte  // slice of network read buffer were the decimal resides.

func (d decimal) ValueExponent() (sign bool, significand *big.Int, exponent int32) {
  /// Lazy parsing of the byte slice...
}

Don't want to have to support SetValueExponent().

griesemer commented 5 years ago

@kardianos sign * significand * 10 ^ exponent has a base-10 exponent but (in the proposed API) a base-2 significand. @renthraysk mentioned that the MySQL X wire format for decimals uses packed BCD (binary coded decimals) for the significand. To me that implies that somewhere the significand must be converted into a true decimal representation, which is roughly equivalent to printing the value in decimal form (and condensing the digits '0' to '9' into 4 BCD bits each). @ericlagergren also provided a piece of code that essentially converts the decimal into a string.

This is why I asked originally why the API doesn't just require printing the decimal value (and massaging the bits).

Presumably some databases do store decimals w/o using BCD? Can anyone attest to this?

On the other hand, if all DB's store decimals in a true decimal form (BCD, or some string form) then the proposed API seems perhaps suboptimal because the DB would have to do the conversion. I would seem better that the API already provides the decimal representation, assuming that there are more DBs to connect to than decimal packages. It might be simpler to just define a string-based API.

ericlagergren commented 5 years ago

I alluded to this in my previous comment, but one reason a string API might be better (assuming some/most DBs use strings for decimal inputs) is that converting a decimal to a string isn’t entirely straightforward if you’re following the GDA spec and its various corner cases.

Ostensibly, decimal packages themselves will have proper string formatting.

I’m just spitballing, but what about something like fmt.Formatter with an unused verb? I’m not sure of a way to test whether an implementation supports the verb, though.

ericlagergren commented 5 years ago

Off topic: I have some BCD code lying around somewhere that I’m more than willing to donate.

kardianos commented 5 years ago

@ericlagergren Most databases don't use a string representation in their protocol. Postgresql can use string representation, more mature drivers tend to avoid it and use the binary representation. SQL Server uses a sign then a binary significand. Postgresql binary uses base 10000 representation.

@griesemer As above, the protocols used have widely different representations. One problem the early database/sql package tried to do was to make life too simplified on database drivers. In the end, this made their job harder. So long as drivers have a way forward for dealing with decimals, drivers that want to support decimals in this way will. I don't want someone to avoid this API because it costs too many allocations. I like your API suggestion in part because it can be much more efficient. It would be fairly easy (and a one time cost) to make a package that just turns a lower level representation into a string, and the other way around.

Thank you for explaining about base-2 vs base-10 (BCD). I understand your meaning now. Regarding scale being int vs int32, I think I would suggest int32 is used, so if the result of a decomposed decimal was stored somewhere as-is, it would be clear only 32-bits of storage where needed, never 64. The usability argument again is less of an issue here, because I anticipate only certain libraries will typically make use of this, and they all use a int32 scale anyway, so it would probably be easier to just assign the scale and then do sanity checks before or after. Similarly with sign, I think I would go with a neg bool. It shouldn't matter for most use cases, but apd also uses a negative bool, and again it would be clearer to someone storing the components what should actually be signaled. It would also prevent implementations from smuggling in additional information through the integer. I agree with Eric that the use of scale should be avoided because as he says it can imply a negative exponent.

@ericlagergren @mjibson @griesemer Here is what I'm currently thinking as a definition: (edited)

// Decimal composes or decomposes a decimal value to and from individual parts.
// There are four separate parts: a boolean negative flag, a form byte with three possible states
// (finite=0, infinite=1, NaN=2),  a base-2 big-endian integer
// coefficient (also known as a significand) as a []byte, and an int32 exponent.
// These are composed into a final value as "decimal = (neg) (form=finite) coefficient * 10 ^ exponent".
// A zero length coefficient is a zero value.
// If the form is not finite the coefficient and scale should be ignored.
// The negative parameter may be set to true for any form, although implementations are not required
// to respect the negative parameter in the non-finite form.
//
// Implementations may choose to signal a negative zero or negative NaN, but implementations
// that do not support these may also ignore the negative zero or negative NaN without error.
// If an implementation does not support Infinity it may be converted into a NaN without error.
// If a value is set that is larger then what is supported by an implementation is attempted to
// be set, an error must be returned.
// Implementations must return an error if a NaN or Infinity is attempted to be set while neither
// are supported.
type Decimal interface {
    // Decompose returns the internal decimal state into parts.
    // If the provided buf has sufficient capacity, buf may be returned as the coefficient with
    // the value set and length set as appropriate.
    Decompose(buf []byte) (form byte, negative bool, coefficient []byte, exponent int32)

    // Compose sets the internal decimal value from parts. If the value cannot be
    // represented then an error should be returned.
    Compose(form byte, negative bool, coefficient []byte, exponent int32) error
}

Would both of your support adding this to your decimal packages?

ericlagergren commented 5 years ago

How would that interface signal 0.00?

kardianos commented 5 years ago

@ericlagergren Good point. Adjusted.

griesemer commented 5 years ago

This seems reasonable. A few comments:

If you have an explicit form parameter, I'd put it first in the parameter lists. Its value determines the interpretation of everything else.
Regarding the int32: Do the different DBs also all use int32 exponents? If not, you will have to check a valid exponent range, don't you (at least for Compose)? I'm not convinced the marginal benefit of having an int32 exponent in Decompose, just to avoid a conversion when converting to a specific Decimal package implementation, is worth it. There's essentially one such conversion in the code. But it's going to be cumbersome to use int32 everywhere else, e.g. in tests where you want to interact with int's, etc. (This is a major reason why we're starting to move away even from using uints for values that can be clearly only >= 0 in the std lib and the language (shifts!); it's just annoying. People tend to introduce the uint conversions to make the code work, but there's really no check. I suspect you will see the same happening here.)
Any reason for not having a special type DecimalForm byte and values?
```
const (
Finite DecimalForm = iota
Infinite
NaN
)
```
The doc should probably say that a coefficient with length 0 means the value 0 (and then you don't need to say anything about a nil coefficient). Also, it remains unclear to me how 0.00 would be represented. What am I missing? (It might be good to have some of these cases as examples in the doc, that would clarify this a lot).
The doc should also say which parameters are valid/can be used if the form is not Finite.

ericlagergren commented 5 years ago

@gri 0.00 is (mant=0, exp=2). 😀

griesemer commented 5 years ago

@ericlagergren Thanks. Clearly my coin needs to drop some more...

kardianos commented 5 years ago

@griesemer

Form order: seems reasonable. Moved.
exponent type: I agree about uints for sure. In my experience, working with anything that may be used in an index (directly or inderectly) should be an int for sure, but anything touching a wire protocol or persistence should use specific types such as int32 or int64.
DecimalForm: That would be ideal. However that would also require a reference to that type in all implementation, tie it to a package. If in the standard library the type would tie to a minimum Go version.
nil coefficient: Good point, I'll update the doc.
valid parameters: Good point, I'll update the doc.

kardianos commented 5 years ago

I recently sent out a survey on decimal use and it has some interesting results. I'll aggregate and share later. In the mean time, I discovered a few things:

Some people do use one of the top arbitrary precision packages (shopspring, apd).
Many people report using float32 / float64 for deicmal money (a recent issue on a database driver mentioned someone doing this, which is part of what prompted me to send this survey out).
A number of people use int64 cents or int64 * 10 ^-6 or similar ad-hoc uses.
A number of people use one of the following fixed decimal packages:

Here is what I intend to do next:

Create a package that documents and has a test harness of some type for the interface described here.
Open PRs to shopspring/decimal, apd, and ericlagergren/decimal. If approved (but not necessarily merged, in case we want to defer merging until a quorum is reached), then I'll open PRs to some of the fixed decimal packages.
I also want to consider adding a fixed decimal type to the Go standard library, maybe math/D128. But that's a separate topic (I'm aware of the previous closed proposal).

/cc @dimdin @openware @Rhymond @rin01 @robaho @lionelbarrow

kardianos commented 5 years ago

Repo with interface declaration: https://github.com/golang-sql/decomposer (I don't reference this package in any implementation but it is here for human reference). Again my plan is to start with a few implementations and see what feedback looks like.

PRs (plan to edit this list as I go along):

golang / go

proposal: database/sql: define a Decimal decompose interface for decimal packages #30870

Background

19787 #12127 #12332 .

Proposal