golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.3k stars 17.58k forks source link

proposal: database/sql: define a Decimal decompose interface for decimal packages #30870

Open kardianos opened 5 years ago

kardianos commented 5 years ago

Background

Databases store many types of values. One type of very common value databases store, especially SQL databases, are decimal values. Some databases limit the size of the decimal, while others support arbitrary precision. But common to all is a base10 representation of the decimal number.

Historically handling decimals for database/sql and database drivers has been a pain. What types should a driver look for? If a driver looks for a given type and handles it, then it has to import it and depend on it. Or possibly go to the trouble of injecting the various types it handles with some type of type registry. The solution space at present for "library" packages that need to deal with decimals, but may be used by many other applications, is sub-optimal.

There is a history of proposals for including a decimal type into the standard library:

19787 #12127 #12332 .

Lastly, there are a number of decimal packages. Each implementation has a similar type implementation:

Generally each decimal type has a big.Int coefficient and an int32 exponent. All of these types of decimals are arbitrary precision, not fixed size like decimal128.

Proposal

I propose that a common interface is defined for a decimal type that dumps the big.Int and int32 exponent. This interface could be defined in the standard library, or agreed upon by each package. This way a SQL driver would need to do a type assertion for the well known interface, dump the value and exponent, and marshal the value to the driver. When scanning a decimal value, it would try to load the value and exponent into the decimal type presented to it.

It may be necessary to also provide some test vectors along with the implementation so the value and exponent are interpreted the same.

The proposed interface:

// Decimal composes or decomposes a decimal value to and from individual parts.
// There are four separate parts: a boolean negative flag, a form byte with three possible states
// (finite=0, infinite=1, NaN=2),  a base-2 little-endian integer
// coefficient (also known as a significand) as a []byte, and an int32 exponent.
// These are composed into a final value as "decimal = (neg) (form=finite) coefficient * 10 ^ exponent".
// A zero length coefficient is a zero value.
// If the form is not finite the coefficient and scale should be ignored.
// The negative parameter may be set to true for any form, although implementations are not required
// to respect the negative parameter in the non-finite form.
//
// Implementations may choose to signal a negative zero or negative NaN, but implementations
// that do not support these may also ignore the negative zero or negative NaN without error.
// If an implementation does not support Infinity it may be converted into a NaN without error.
// If a value is set that is larger then what is supported by an implementation is attempted to
// be set, an error must be returned.
// Implementations must return an error if a NaN or Infinity is attempted to be set while neither
// are supported.
type Decimal interface {
    // Decompose returns the internal decimal state into parts.
    // If the provided buf has sufficient capacity, buf may be returned as the coefficient with
    // the value set and length set as appropriate.
    Decompose(buf []byte) (form byte, negative bool, coefficient []byte, exponent int32)

    // Compose sets the internal decimal value from parts. If the value cannot be
    // represented then an error should be returned.
    Compose(form byte, negative bool, coefficient []byte, exponent int32) error
}

This is the original proposed interface: out-dated, see this comment or see above.

// Form of the number. If this lived in "math/big" it could be typed as "type Form byte".
const (
    Finite byte = iota // Normal numbers
    NaN
    InfiniteNegative
        InfinitePositive
)

// DecimalValue is an interface decimal implementations may implement to
// allow easy conversion between various decimal packages. These interfaces
// may also used by packages that need to marshal or unmarshal a decimal value,
// but don't care about the rest of the decimal type.
//
// The decimal value in the finite form is defined as "decimal = coefficient * 10^exponent".
type DecimalValue interface {
    // ValueExponent returns the coefficient, exponent, and form of the decimal value.
    ValueExponent() (coefficient big.Int, exponent int32, form byte)

    // SetValueExponent sets the decimal number's value, exponent, and form.
    // If the decimal number does not support the value form (such as Infinite),
    // or if the value is larger then a fixed size representation can handle, an error should be returned.
    // However, if a decimal supports NaN but not Infinite forms, the Infinite forms
    // may be coerced into a NaN.
    SetValueExponent(coefficient big.Int, exponent int32, form byte) error
}

/cc @mjibson @victorquinn @ericlagergren

If this, or a similar, interface is acceptable to cockroachdb, shopspring, and Eric, I'd be willing to open up PRs to add the methods and some test vectors.


Edited proposed interface to include a "form".

ericlagergren commented 5 years ago

I think decimal = mant * 10^exp should be a part of the API.

I really like this idea.

I’ll have more feedback later.

maddyblue commented 5 years ago

The immediate problem I see with this is the non-numeric decimal representations. apd and ericlagergren's decimal type support positive and negative infinity, nan, and signaling nan. The shopspring decimal and many others do only mantissa + exponent. The special values are handled well with floats because they are defined as part of the bits. I think that if we want adoption of this interface it would need to support more than finite numbers.

renthraysk commented 5 years ago

Just to clarify... On querying, driver.Next() has to pass back an implementation of DecimalValue, and then database/sql will attempt to convert/copy that into whatever type was passed into sql.Rows.Scan()?

kardianos commented 5 years ago

@renthraysk Yes.

It would first attempt to match the type (like apd.Decimal -> apd.Decimal) but failing that, if both the src and dest satisfy the DecimalValue type, it could extract the bits from one and stuff it into the other.

renthraysk commented 5 years ago

Leaves NULL handing. Currently drivers return NIL, and database/sql calls dest type's Scan() func for setting Valid field. So something like

type NullDecimal struct { Valid bool Value DecimalValue } func (nd *NullDecimal) Scan(src []interface{}) error { ... }

?

kardianos commented 5 years ago

@ericlagergren If you have time this week, additional feedback would be appreciated.

@mjibson What do you think of the edited interface? I modified the interface to take a "form", as well as the the set method returning a method. This way if a decimal implementation and the number was too large, or the implementation doesn't support NaN or infinity and the number was in that form, it could return an error.

maddyblue commented 5 years ago

I recommend removing pos and neg infinity and just having infinity. The sign of the infinity can be extracted from the sign of the big.Int value. In addition, I recommend adding sNaN, which is the signaling nan that is supported by most decimal implementations (see https://docs.python.org/2/library/decimal.html#decimal-objects for example).

Separating the sign from the form is important because in addition to +/- infinity, the spec also supports +/- NaN and +/- sNaN. See http://speleotrove.com/decimal/damodel.html, which discusses the requirements of finite and special values, and is linked to by the python decimal module.

Although there is a very good argument to be made that the forms you have as of now (pos inf, neg inf, nan, finite) are sufficient since the sign and type of nan almost never matters, and the java bigdecimal implementation only supports those forms.

apd supports 4 kinds of nan, but I suspect most users only use the normal nan. In our use of apd, we certainly only use the normal nan and coerce all other nans to it.

kardianos commented 5 years ago

@mjibson Thanks for the feedback. I appreciate the links. I finished reading the speletrove link and skimmed the python doc link.

My point of reference is mostly line of business applications. The implementations I'm looking at are C#, Java, Oracle, PostgreSQL, and MS SQL Server.

Is there a different set of use cases I'm not thinking of where transmitting the signed NaN, or various forms of NaN would be advantageous?

maddyblue commented 5 years ago

Honestly no, I'm not aware of anyone caring about any kind of NaN other than just the normal NaN. I guess in that light I think your current proposal is fine. I almost think the previous one with posinf, neginf, nan, finite may be better just because it feels medium risky to tell people to inspect the sign of the big.Int. Unsure.

ianlancetaylor commented 5 years ago

CC @griesemer

kardianos commented 5 years ago

@mjibson When I wrote some pseudo code to unmarshal and marshal the infinity forms, it was clearer to keep Positive and Negative Infinity as separate forms. I reverted to having them separate.

ericlagergren commented 5 years ago

Overall, I like the idea.

I'm kind of torn between including different types of NaN and infinity values, though.

I don't see much value in inserting qNaN into a database. So, I like the idea of having three simple form constants: Finite, Inf, and NaN.

That being said, if people do need those values it'll be difficult to add them on top of the aforementioned iota-based form constants.

So, a bitmask seems like a decent approach:

type Form byte
const (
    Finite Form = 0
    Inf    Form = 1 << 2 // ±Inf
    NaN    Form = 1 << 3 // qNaN or sNaN

    // sign can be used to create -infinity:
    //    NegInf = sign | Infinity
    // as well as signed NaN values
    //    NegNaN = sign | NaN
    // it can also be used to represent -0
    // if coefficient == 0 and Form == sign
    sign Form = 1 << 1
)

Something else not considered is -0. Though there are not a lot of practical uses, it is important. The current proposal doesn't allow for -0 because big.Int does not support it.

ericlagergren commented 5 years ago

Additionally, I know decimals are well-defined, but I would take the extra step to add decimal = coefficient * 10^exponent to the API.

The reason for this is some libraries like my decimal.Big and like Java's BigDecimal expose the exponent as a "scale" value that, if positive, is equal to the number of digits to the right of the decimal point. In effect, scale = -exponent. (FWIW: internally, the libraries work on the exponent like normal.)

If the equation is defined in the DecimalValue API (much like it is in big.Float's API) then it's less likely to confuse folks.

griesemer commented 5 years ago

math/big doesn't support NaN, so there's no point in having it in such an API.

More generally, conversion from/to decimals requires parsing/printing the decimal numbers; in other words these are string conversions. The conversions in each of these directions is a handful of lines of code. I am not convinced that these need to be added to the big.Float API. See here for an example implementation:

https://play.golang.org/p/iJpjp9J11yy

The code is easily adjusted to provide big.Ints if that is desired. The string ops may be not totally optimal but negligible (in terms of overhead) compared to the actual Float conversions (which are expensive, and there is no way around those).

If the decimals are in []byte rather than in string form, then one can use Un/MarshalText instead.

In short, it seems to me that it's better to write the few lines of code that are custom-fitted for the specific application in mind than adding more API that may not be quite right in the specific case.

renthraysk commented 5 years ago

This is about getting decimals (including NaNs) in and out of database/sql and drivers without having to depend on any specific implementation.

griesemer commented 5 years ago

@renthraysk NaNs are not "decimals". There's no support for NaNs in math/big by design (and we're not going to add support for NaNs); so let's leave them out of the picture. If you need to represent NaNs, you can always have a special encoding outside of math/big.

I don't know what you mean by "without relying on a specific implementation". The title of this issue is pretty clear, the specific implementation is math/big. Please clarify.

ericlagergren commented 5 years ago

NaNs are not "decimals"

I hope I'm not being too pedantic or misunderstanding your point, but NaN values are decimals in that they're special values explicitly covered by decimal floating point specs.

kardianos commented 5 years ago

@griesemer Thank you for looking at this issue. I mostly focus on database/sql and database drivers. This issue pertains specifically to (base-10) decimal types. It is most important not to end applications, but intermediate libraries, notably database drivers.

I would be nice to have a base-10 decimal type in the standard library, but baring that, it would be nice to have an interface for marshaling and unmarshaling decimal types in and out of non-standard types. This interface could be defined in "math/big", or it could be implicitly defined in decimal packages as we mutually agree to it. So while this could involve "math/big", the interface could also be implicit.

Example 1:

  1. User wants to put a decimal value into a database. Say they are using cockroach/adb in the application.
  2. They pass this decimal value to db.QueryContext("insert into mymoney (?);", decimalValue).
  3. database/sql passes this into the database driver and it needs to handle that decimalValue. What should it do? a. Require all applications to pass in decimals as strings? b. Require a single decimal type to convert to before passing in? c. Require any decimal type to support a specific interface that can be efficiently read and handled?

Example 2:

  1. User wants to read a decimal value from a database. Say they are using shopspring/decimal in the application.
  2. They create a decimal type, then scan from row row.Scan(&decimal).
  3. database/sql looks at decimal and doesn't know what to do with it, so it asks the driver if it can handle it. The driver looks at the value and might do one of the following: a. Require all decimals to be pulled out of the database as strings? b. Require a single decimal type to convert from before scanning out? c. Require any decimal type to support a specific interface that can be efficiently set and handled.

Let's ignore NaN at the moment. Regardless, as a database driver, telling clients to always go through strings or a []byte interface isn't really an option I see as viable.

I hope this clears up what I'm trying to accomplish with this issue.

griesemer commented 5 years ago

@ericlagergren Point taken, but again, they are not supported by math/big, by design (math/big is a software library, not a hardware standard, and as such has many other ways to express errors such as a division by zero w/o resorting to special "in-band" values such as NaNs).

@kardianos Thanks for the clarification. I suggest you change the title of this issue to something that better reflects the intent. From you response I gather that this issue is only marginally related to math/big.

renthraysk commented 5 years ago

@griesemer The implementations I was referring to was the decimal packages listed in the original post.

griesemer commented 5 years ago

@kardianos Thanks for the title change, this makes is much clearer (to me, at least). I'm somewhat amazed that all the Decimal packages are using a *big.Int plus exponent representation. Wouldn't it be better to not be so dependent on that? For instance, borrowing freely from your suggestion:

type Decimal interface {
   // Value returns the sign, mantissa, and scale factor of the decimal value x, such that
   // x = sign * mant * 10^exp, where a sign < 0 indicates a negative value (otherwise it
   // is a positive value), mant contains the mantissa bits in little-endian binary format,
   // and scale is the scale factor by which the mantissa is adjusted. The value 0 is
   // represented by an empty mantissa (len(mant) == 0) and a scale of zero, while an
   // infinity uses an empty mantissa with any non-zero scale value. If a sufficiently
   // large buf is provided, its underlying array is used for mant.
   Value(buf []byte) (sign int, mant []byte, scale int)

   // ...
   SetValue(sign int, mant []byte, scale int) error
}

With an interface like this one wouldn't be so tied to math/big, yet it would still be reasonably efficient to convert to math/big.Int since big.Int's have methods to convert from/to bytes.

Some questions to answer:

Also, how important is it to be able to represent NaNs?

kardianos commented 5 years ago

@griesemer Thank you for the continued feedback. It would be nice to not be coupled to big.Int, though it does come at a cost of being a slightly more complicated definition. I would need to implement your proposed interface on an existing package or two before I have answers to your questions.

griesemer commented 5 years ago

@kardianos Please find my replies below, in the order of your questions:

ericlagergren commented 5 years ago

@griesemer

I haven't fully digested your last two comments yet, but I'm curious what you meant by

I'm somewhat amazed that all the Decimal packages are using a *big.Int plus exponent representation.

maddyblue commented 5 years ago

Decimals (or at least the decimal spec previously mentioned) have the ability to differentiate between representations like 123, 123.0, 123.00, so I think normalizing the mantissa is not desired.

griesemer commented 5 years ago

@ericlagergren What I meant is literally that I am surprised that given six essentially independent packages (I assume that's the case) all chose a very similar implementation. I would have expected that some of them would use a custom, fixed-size representation.

ericlagergren commented 5 years ago

@mjibson re: normalization: http://speleotrove.com/decimal/decifaq4.html It's mentioned elsewhere, too.

@griesemer ah. Just curious. I hope to have BCD at some point. And fixed size. But that's off topic for now :)

griesemer commented 5 years ago

@mjibson Thanks for the clarification. So 123.00 would presumably be internally represented as the (value, scale) pair (12300, -2), 123.0 would be (1230, -1), and 123 would be (123, 0)? What about 12300.? Is that (12300, 0) or (123, 2)? How does this match the proposed original interface?

ericlagergren commented 5 years ago

@griesemer 12300. could be either. They'd print differently, though. :)

Do note that, in some APIs—like BigDecimal—"scale" is not the same thing as "exponent."

I do not understand how this would not match the proposed original interface.

I'd be curious as to your choice of int32.

int32 is used instead of int because (1) it's highly unlikely an exponent greater than 1<<31-1 (or lesser than the complement) will ever be needed given the current physical limits of machines, and (2) it simplifies internal calculations because you can avoid overflows by performing 64-bit arithmetic.

If NaN is important, I'm sure it could be encoded somehow as well.

NaN isn't that hard by itself, it's the different types of NaNs that make it challenging. Like @mjibson mentioned earlier, there are two types of NaNs, signaling and quiet. And both types can be signed. And there's also NaN payloads.

griesemer commented 5 years ago

@ericlagergren Thanks. I was asking about 12300 and its representation because printing does matter, doesn't it? It's also clear that scale is not the same as exponent, so this would have some bearing on the API, wouldn't it?

Regarding the choice of int32: Your answer is exactly what I expected. I agree that it makes no sense to have exponents > 32bits. In fact I'd argue it makes no sense to have exponents > than a much smaller maximum (say, 16 bits). I would not "pretend" that exponents can be as large as 32bits by enshrining this value in the API, and instead check dynamically that exponents are within a fixed range that fits well within 32 bits. It will make the implementation slightly more complicated but simplify clients because they can just deal with ints. (math/big internally uses 32bit exponents but doesn't expose that for this reason.)

If NaN's are so pervasive in this (DB) world, it seems better to me to have them explicitly taken care of in the interface, especially if they can carry a payload. Yes, one would have to check for NaNs but at least that would make it very clear what is happening.

ericlagergren commented 5 years ago

It's also clear that scale is not the same as exponent, so this would have some bearing on the API, wouldn't it?

I know it's different than how it's used elsewhere in CS, but scale = -exponent. I mentioned this in a previous comment:

... some libraries like my decimal.Big and like Java's BigDecimal expose the exponent as a "scale" value that, if positive, is equal to the number of digits to the right of the decimal point. In effect, scale = -exponent. (FWIW: internally, the libraries work on the exponent like normal.)

I'm not really tracking why (12300, 0) and (123, 2) matter so long as the API can accurately represent both, and AFAIK all example APIs so far (yours, so long as there's no normalization, and @kardianos').

I would not "pretend" that exponents can be as large as 32bits by enshrining this value in the API, and instead check dynamically that exponents are within a fixed range that fits well within 32 bits.

For more context: decimal libraries are allowed to set their own exponent ranges. Theoretically, I could allow any 32-bit integer in my library, while cockroachdb/apd could restrict theirs to [-1000, 1000]. Generally, the GDA spec recommends the exponent range be symmetric and no less than 10 times the maximum length (in digits) of the mantissa, with a hard floor of 5 times the maximum length iff there's a maximum length. More info: http://speleotrove.com/decimal/damodel.html

I don't think this API should be more strict than the GDA spec.

maddyblue commented 5 years ago

The numbers would be printed differently: (12300, 0) = 12300, (123, 2) = 123e2, so there's no ambiguity.

griesemer commented 5 years ago

@ericlagergren, @mjibson : I think my coin has finally dropped; apologies for being slow, I think normalization has side-tracked me. So the value, in decimal form, are the digits (excluding an exponent) that one would always see in printed form: if the value is 12300, these are the digits that will show up, possibly with a decimal point somewhere. The scale (or exponent) simply positions that decimal point (and may or may not show up as exponent, possibly depending on formatting options).

Going back to exponent ranges: I agree that they should be symmetrical; and that there should be a reasonable upper limit. FWIW, it's much simpler to write a correct implementation that has a spelled out upper limit (well below 32 bits) than trying to make the code correct for a full 32bit exponent (which is not symmetric anyway) in all cases.

Going back to the proposed interface and naming: How about significand instead of coefficient or mantissa? It more clearly explains what it represents, namely the significant digits of the value. And perhaps scale instead of exponent.

And finally, going back to the proposed interface: The initial comment mentions that typically SQL databases have a base10 representation for decimal numbers. But the proposed interface uses a binary representation. Who is doing the conversion?

kardianos commented 5 years ago

And finally, going back to the proposed interface: The initial comment mentions that typically SQL databases have a base10 representation for decimal numbers. But the proposed interface uses a binary representation. Who is doing the conversion?

I'm not sure what you man by binary representation. If you mean the driver still needs to take the significand []byte, scale, and sign and put it together or take it apart, then I would imagine database drivers would either:

  1. Make a small internal package to do this with just enough bits to marshal/unmarshal to the wire.
  2. Or import a decimal package that supports the interface, and marshal it into that, then on/off the wire.

Or perhaps taking the question a different direction, once a consensus is reached on the interface I and/or others can open up PRs on the most popular decimal packages so the decimal types implement this interface.

griesemer commented 5 years ago

@kardianos Sorry for being unclear. https://github.com/golang/go/issues/30870#issue-421630412 said: "But common to all is a base10 representation of the decimal number." But this interface is using a base2 representation. So I assume the DB driver does whatever conversion necessary?

I'm asking all these detailed questions because I want to understand how this would be used in practice and why it should be in the std library in the first place.

My understanding is that this interface would decouple different DBs from apps using different decimal number packages so that one doesn't need n*m different connections (given n DBs and m different decimal number packages), only n+m connections: from n DBs to 1 interface, and from that 1 interface to m different decimal packages.

kardianos commented 5 years ago

My understanding is that this interface would decouple different DBs from apps using different decimal number packages so that one doesn't need n*m different connections (given n DBs and m different decimal number packages), only n+m connections: from n DBs to 1 interface, and from that 1 interface to m different decimal packages.

This de-coupling is indeed the benefit.

: "But common to all is a base10 representation of the decimal number." But this interface is using a base2 representation. So I assume the DB driver does whatever conversion necessary?

I'm sure I'm missing something. My understanding is we can decompose a base-10 decimal concept into the following components (ignoring Inf and NaN):

These parts would be composed as follows: base-10 decimal = sign * significand * 10 ^ exponent. Is this not the case?

But to answer your question, yes, the DB Driver will do whatever is needed to convert between the database wire protocol representation to this decomposed representation, where this interface can set or retrieve the value. DB Drivers can support "custom" data types, so we could omit support from database/sql for the time being even and let the database drivers do the lifting.

ericlagergren commented 5 years ago

... The initial comment mentions that typically SQL databases have a base10 representation for decimal numbers. But the proposed interface uses a binary representation. Who is doing the conversion?

Using my package as an example, I would implement ValueExponent as follows:

func (x *Big) ValueExponent() (sign bool, significand *big.Int, exponent int32) {
    significand = new(big.Int)
    if x.isCompact() {
        significand.SetUint64(x.compact)
    } else {
        significand.Set(&x.unscaled)
    }
    return x.form&signbit, significand, x.exp
}

If the DB requires a string (e.g., lib/psql), the SQL driver could do something like

sign, sig, exp := v.ValueExponent()
s := sig.String()
s = s[:exp] + "." + s[exp:]
if sign {
    s = "-" + s
}
// send s to the DB

(NB: the code examples are off-the-top-of-my-head example and inefficient, they miss corner cases, etc.)

Similarly so with whatever binary encoding the DB uses.

However, it's not very ideal for SQL drivers to do the string formatting themselves.

renthraysk commented 5 years ago

MySQL X's wire format of decimals is a byte for the scale followed by packed BCD for the significand, and a nibble for the sign with optional zero nibble for padding. So not sure how common base10 is...

@kardianos Talking of 1 and small enough. database/sql requires both the ValueExponent() and SetValueExponent(). However a driver only needs the ValueExponent().

The simplest implementation would be something like...

type decimal []byte  // slice of network read buffer were the decimal resides.

func (d decimal) ValueExponent() (sign bool, significand *big.Int, exponent int32) {
  /// Lazy parsing of the byte slice...
}

Don't want to have to support SetValueExponent().

griesemer commented 5 years ago

@kardianos sign * significand * 10 ^ exponent has a base-10 exponent but (in the proposed API) a base-2 significand. @renthraysk mentioned that the MySQL X wire format for decimals uses packed BCD (binary coded decimals) for the significand. To me that implies that somewhere the significand must be converted into a true decimal representation, which is roughly equivalent to printing the value in decimal form (and condensing the digits '0' to '9' into 4 BCD bits each). @ericlagergren also provided a piece of code that essentially converts the decimal into a string.

This is why I asked originally why the API doesn't just require printing the decimal value (and massaging the bits).

Presumably some databases do store decimals w/o using BCD? Can anyone attest to this?

On the other hand, if all DB's store decimals in a true decimal form (BCD, or some string form) then the proposed API seems perhaps suboptimal because the DB would have to do the conversion. I would seem better that the API already provides the decimal representation, assuming that there are more DBs to connect to than decimal packages. It might be simpler to just define a string-based API.

ericlagergren commented 5 years ago

I alluded to this in my previous comment, but one reason a string API might be better (assuming some/most DBs use strings for decimal inputs) is that converting a decimal to a string isn’t entirely straightforward if you’re following the GDA spec and its various corner cases.

Ostensibly, decimal packages themselves will have proper string formatting.

I’m just spitballing, but what about something like fmt.Formatter with an unused verb? I’m not sure of a way to test whether an implementation supports the verb, though.

ericlagergren commented 5 years ago

Off topic: I have some BCD code lying around somewhere that I’m more than willing to donate.

kardianos commented 5 years ago

@ericlagergren Most databases don't use a string representation in their protocol. Postgresql can use string representation, more mature drivers tend to avoid it and use the binary representation. SQL Server uses a sign then a binary significand. Postgresql binary uses base 10000 representation.

@griesemer As above, the protocols used have widely different representations. One problem the early database/sql package tried to do was to make life too simplified on database drivers. In the end, this made their job harder. So long as drivers have a way forward for dealing with decimals, drivers that want to support decimals in this way will. I don't want someone to avoid this API because it costs too many allocations. I like your API suggestion in part because it can be much more efficient. It would be fairly easy (and a one time cost) to make a package that just turns a lower level representation into a string, and the other way around.

Thank you for explaining about base-2 vs base-10 (BCD). I understand your meaning now. Regarding scale being int vs int32, I think I would suggest int32 is used, so if the result of a decomposed decimal was stored somewhere as-is, it would be clear only 32-bits of storage where needed, never 64. The usability argument again is less of an issue here, because I anticipate only certain libraries will typically make use of this, and they all use a int32 scale anyway, so it would probably be easier to just assign the scale and then do sanity checks before or after. Similarly with sign, I think I would go with a neg bool. It shouldn't matter for most use cases, but apd also uses a negative bool, and again it would be clearer to someone storing the components what should actually be signaled. It would also prevent implementations from smuggling in additional information through the integer. I agree with Eric that the use of scale should be avoided because as he says it can imply a negative exponent.

@ericlagergren @mjibson @griesemer Here is what I'm currently thinking as a definition: (edited)

// Decimal composes or decomposes a decimal value to and from individual parts.
// There are four separate parts: a boolean negative flag, a form byte with three possible states
// (finite=0, infinite=1, NaN=2),  a base-2 big-endian integer
// coefficient (also known as a significand) as a []byte, and an int32 exponent.
// These are composed into a final value as "decimal = (neg) (form=finite) coefficient * 10 ^ exponent".
// A zero length coefficient is a zero value.
// If the form is not finite the coefficient and scale should be ignored.
// The negative parameter may be set to true for any form, although implementations are not required
// to respect the negative parameter in the non-finite form.
//
// Implementations may choose to signal a negative zero or negative NaN, but implementations
// that do not support these may also ignore the negative zero or negative NaN without error.
// If an implementation does not support Infinity it may be converted into a NaN without error.
// If a value is set that is larger then what is supported by an implementation is attempted to
// be set, an error must be returned.
// Implementations must return an error if a NaN or Infinity is attempted to be set while neither
// are supported.
type Decimal interface {
    // Decompose returns the internal decimal state into parts.
    // If the provided buf has sufficient capacity, buf may be returned as the coefficient with
    // the value set and length set as appropriate.
    Decompose(buf []byte) (form byte, negative bool, coefficient []byte, exponent int32)

    // Compose sets the internal decimal value from parts. If the value cannot be
    // represented then an error should be returned.
    Compose(form byte, negative bool, coefficient []byte, exponent int32) error
}

Would both of your support adding this to your decimal packages?

ericlagergren commented 5 years ago

How would that interface signal 0.00?

kardianos commented 5 years ago

@ericlagergren Good point. Adjusted.

griesemer commented 5 years ago

This seems reasonable. A few comments:

ericlagergren commented 5 years ago

@gri 0.00 is (mant=0, exp=2). 😀

griesemer commented 5 years ago

@ericlagergren Thanks. Clearly my coin needs to drop some more...

kardianos commented 5 years ago

@griesemer

kardianos commented 5 years ago

I recently sent out a survey on decimal use and it has some interesting results. I'll aggregate and share later. In the mean time, I discovered a few things:

Here is what I intend to do next:

/cc @dimdin @openware @Rhymond @rin01 @robaho @lionelbarrow

kardianos commented 5 years ago

Repo with interface declaration: https://github.com/golang-sql/decomposer (I don't reference this package in any implementation but it is here for human reference). Again my plan is to start with a few implementations and see what feedback looks like.

PRs (plan to edit this list as I go along):