centrifuge / pod

Go implementation of Centrifuge POD (Private Off-chain Data) node
https://docs.centrifuge.io/build/p2p-node/
MIT License
77 stars 35 forks source link

Change decimal-point-based value handling in APIs and P2P layer #423

Closed vimukthi-git closed 5 years ago

vimukthi-git commented 6 years ago

Problem

This is a general problem with our API where all numeric values are handled as int64 right now. This is issue is to address the problem for money values first such as gross and net amounts.

Following is an example of the problem

API call

curl -X POST \
  https://localhost:8082/invoice \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'Postman-Token: c639bf2d-f769-408c-aa24-5e7fd7831b59' \
  -H 'cache-control: no-cache' \
  -d '{
  "data": {
    "invoice_number": "122334", 
    "due_date": "2018-11-23T23:12:37.902198664Z", 
    "date_created": "2018-10-23T23:12:37.902198664Z", 
    "currency": "USD", 
    "gross_amount": 18.72, 
    "net_amount": 15.77
  }, 
  "collaborators": ["0xxxxxxxxxxx"]
}'

Result

{
    "error": "json: cannot unmarshal number 18.72 into Go value of type int64",
    "message": "json: cannot unmarshal number 18.72 into Go value of type int64",
    "code": 3
}

suggested solution

On the protocol: All decimals are 256bit integers and assume a precision of 18 behind the decimal. On the client APIs: Decimals are represented as strings where the decimal separator is .

example: client api:

'amount':'38447.3214',
...

protocol representation:

'amount':38447321400000000000000,
...

cc @lucasvo @mikiquantum @pstehlik

vimukthi-git commented 6 years ago

Note that this might be an urgent issue to fix depending on how we decide to move forward with live tx.

mikiquantum commented 6 years ago

Thanks @vimukthi-git this is an urgent issue for our partners.

I agree that using a BigDecimal equivalent is a better approach than float64. On the other hand, let's keep in mind that the converters that you mention need to have a predefined precision accuracy, and during that conversion (unless we use the same type along the flow) might have those problems you mention. In addition, precise proof need to support it (not sure yet if it does). I will check.

vedhavyas commented 6 years ago

I think we could use a string type here since we will not be doing any kind of calculations based on these values.

lucasvo commented 6 years ago

I don't think we need to support arbitrary precision decimals. I would use the same concept as is used in Ethereum. Instead of defining a value to be in ETH we can say the currency is by default in "Wei". BigDecimal adds overhead on the protocol level, float is inaccurate, buggy and not supported well by zkSNARKs. Strings have the same problem and should definitely be avoided this needs to be an integer.

I would say we pick an arbitrary precision, e.g. 10^(-8) and stick to using int on the protocol level. A in int64 gives us 18 decimal places. This would make the maximum value to be: 10^18 * 10^(-8) = 10^10 which is probably sufficient.

mikiquantum commented 6 years ago

I think coupling the hundreds of different currency precisions and the node conversion/manipulation is a lot of overhead for the clients and the node itself. Clients will most (in most of the cases) deal with decimal point values, this will mean that either:

In both cases is a lot of overhead.

vimukthi-git commented 6 years ago

I think compared to ethereum our product is different in that we have to support multiple currencies with varying precisions and ours is a more user level app. I suggest we take the more user friendly approach to fix problems related to our API.

lucasvo commented 5 years ago

Having arbitrary precision decimals is not going to scale and won't really be needed. Here's my pragmatic solution for this:

For custom fields we only support ints and we say that if you want to store decimal values you multiply it by the number of digits precision you want. So if you want 2 digits precision you’d multiply the value with 10**2 so 0.01 -> 1, 1 -> 100. A 64 bit int is > 10**19 so you theoretically can have a number with 19 digits and a decimal point between any two digits. If that is not enough, you can use bytes to store any size integer. Ethereum uses int256 for numbers (which we internally represent as bytes in protobuf). We could use the same types for our own numbers.

We need to avoid storing any number in more than one leaf in the Merkle tree and make sure there are common libraries that can be used to deal with whatever number format we end up using. Floats are definitely a no go.

pstehlik commented 5 years ago

How would that work in a user-friendly way with currencies or units of measurement fields? They have different decimal precisions when interpreting. The approach you are describing would mean that we assume a decimal precision of X on all numbers where X is the max we'd need with current currencies or units of measure. I don't know what that would be - but probably around 8 or 10 digits after the decimal point - actually... ETH and most ERC20 tokens have 18 digits precision... So if we want to support that we'd go for 18 for all those fields. Or do you see that differently? We might shoot ourselves in the foot though if we have to present a number that has higher precision than that.

I'd still somehow like to see if there's a way to distinguish between true Integer and those "floating point Integer" values though. Can we have different types for that?

Is your only point against storing field precision in a separate value the merkle tree/proof generation? A user can always generate two accompaning proofs - one for the value (as int) and one with the precision (or currency, or unit of measure). I don't see that as a hard requirement.

lucasvo commented 5 years ago

I think supporting any amount of ERC20 tokens is a good idea, however that means that we are now required to 256 bit values as 18 digits precision wouldn't really work if we tried to store that in a 64 bit int (a 64bit int can have a max of 19 digits overall). With protobuf that would require us to use the field type bytes and we'd have to somehow deal with converting that to a user-readable value. I think this is a worthwhile tradeoff especially given the uncertainty around using protobufs for our storage format in the long term anyway.

If possible we should avoid arbitrary precision decimals. What I mean with arbitrary precision decimals is that I could set invoice.amount in one document to have a precision of 20 and in another doc a precision of 10. My reason for that is that aggregating these values becomes impossible. It becomes even harder if we'd want to use a zk proof to make a proof of total revenue of a supplier for example. The challenge here is of course not limited to zero-knowledge though, dealing with these values would be very error prone and tricky for anyone implementing.

My proposed solution for this: 1) Storage: store all attributes as simply bytes in the protobuf message. For numbers we can agree on defaulting to 18 digit precision like ERC20 tokens. 2) Presentation to the user (in the client api & UI):

Interesting fact: Maker actually has a pretty simple system with two precision options, wad and ray: https://makerdao.com/purple/#sec-3-1

lucasvo commented 5 years ago

@pstehlik you wrote:

Is your only point against storing field precision in a separate value the merkle tree/proof generation? A user can always generate two accompaning proofs - one for the value (as int) and one with the precision (or currency, or unit of measure). I don't see that as a hard requirement.

Just to clarify my comment above: I think it's fine if for custom attributes you need to supply a proof of the field type to know what precision the field uses. However we should avoid having to do so for first class fields such as the invoice amount.

pstehlik commented 5 years ago

Regarding

My reason for that is that aggregating these values becomes impossible. It becomes even harder if we'd want to use a zk proof to make a proof of total revenue of a supplier for example.

This won't work anyways buy simply summing up the values - even with the same precision. We'll have different currencies, units of measure, etc. They will have to be summed up/grouped individually. I agree that it'll make it "easier" if we have at least one precision on the protocol level - even though from a "functional perspective" we couldn't just add/sum those amounts again. Hence we'll have unit of measure, currency etc fields describing the value that was stored.

Having a precision of 18 is limiting in the end though - not very flexible. And wouldn't that also bloat the storage/size of data additionally?

pstehlik commented 5 years ago

Decision: On the protocol: All decimals are 256bit integers and assume a precision of 18 behind the decimal. On the client APIs: Decimals are represented as strings where the decimal separator is .

example: client api:

'amount':'38447.3214',
...

protocol representation:

'amount':38447321400000000000000,
...