[Design - old] General architecture, features, and API.

msbrogli commented 4 years ago

Introduction

The hathor-wallet-service will be used as backend for all official Hathor's wallets.

This design is highly inspired in the Bitcore Wallet Service.

Sync between full node and service is being designed here: PR https://github.com/HathorNetwork/hathor-wallet-service/issues/8.

TODO LIST

[ ] How to handle tx with timelock and reward lock?
[ ] How to handle Authority UTXO?
[ ] NoSQL vs SQL
[ ] Describe sync between client and service
[ ] Describe sync between full node and service

TEST LIST

[ ] From Executed to Voided transactions
[ ] From Voided to Executed transactions
[ ] From Executed to Voided blocks
[ ] From Voided to Executed blocks
[ ] Authority UTXO
[ ] Token creation
[ ] Transaction timelock
[ ] Reward lock

Requirements

Support running multiple instances under a Load Balance, which will scale as more wallets are being used.

Architecture

┌──────────────┐       ┌─────────────────┐       ┌───────────────┐
│              │       │                 │       │               │
│  Full node   │──────▶│  Tx Processor   │◀─────▶│   Database    │
│              │       │                 │       │               │
└──────────────┘       └─────────────────┘       └───────────────┘
                                │                        ▲        
                                │                        │        
                                │                        ▼        
                                │                ┌───────────────┐
                                │                │               │
                                └───────────────▶│  API Service  │
                                                 │               │
                                                 └───────────────┘

The service will be implemented using AWS services.

DynamoDB: Database
AWS Lambda: Tx Processor and API Service
AWS Gateway API: HTTP and Websocket endpoints
AWS SNS: Messaging service
AWS SQS: Queue service
AWS CodePipeline (test & continuous deploy)

Services

Tx Processor

Receive transactions/blocks from the full node and update the database. It will receive two types of messages: NEW_TX and UPDATE_TX. The latter is used to notify a change in the tx status (from executed to voided or otherwise).

It must keep the balance and transaction history of all addresses, even the ones not designated for a wallet. In the future, we may include an API to query the address and tx history for any address. We would also easily know the richest addresses.

When a new tx arrives or a tx is updated, it must update the indexes of all affected wallets.

When the full node restarts, it should sent all notifications again. If the status of the transaction is the same, we can safely skip it.

API Service

Load from database and return to the wallet. The idea is to do the least number of calculations here. All things should be pre-computed by the Tx Processor.

Maintenance Tools

For each table, we should have a verification tool that checks its consistency. This tool should also have a parameter to recalculate the records of a given table. These tools will be integrated in our monitoring and alert system.

Database

Wallet
- Id
- XPubKey
- AddressesMaxGap (default: 20)
- Status (creating, ready, error)
- CreatedAt
- ReadyAt
- AccessedAt (this may be useful to remove unused wallets)
Balance (only for created wallets)
- WalletId (indexed)
- TokenId (indexed)
- Balance
Address (for all addresses, even if WalletId is null)
- WalletId (indexed, null)
- TokenId (indexed)
- Balance (indexed)
- Number of transactions
- UpdatedAt
AddressTxHistory
- TxId
- Address
- Amount (positive: received / negative: sent)
- Timestamp
- CreatedAt
WalletTxHistory (for all transactions)
- TokenId (indexed)
- WalletId (indexed)
- TxId
- Amount (positive: received / negative: sent)
- Timestamp
- CreatedAt
UTXOs
- TxId
- Index
- TokenId
- Amount
- Address
- TxProposalId
TxProposal
- Id
- Status (open, sent, cancelled)
- CreatedAt
- UpdatedAt
- ClosedAt
Transactions (should we store it or just get from full node?)
- Id (indexed)
- Status (executed, voided)
- Raw tx

Flow from wallets

Client creates a wallet (wallet_id = sha256d(pubkey)).
Client polls the wallet until it is ready.
Get balance and tx history for a token.
Create a tx proposal.
Sign the inputs of the tx proposal.
Resolve the tx proposal.
Propagate the tx proposal.

Events from Full Node

New tx or block.
Update tx or block.

API for the Wallet

Features:

The API can be consumed both by HTTP and Websocket.
Websocket:
- Wallet must authenticate once after connection.
- Wallet must received a balance update message every new tx, so it can update its balance and tx history.

Authentication

All requests must be signed by m/1/1 of the wallet's xpriv.

POST `/wallet`: Create a new wallet

When the wallet is created, the following tasks must be executed:

Mark its addresses in the Address table.
Calculate the balance of all its tokens and store in the Balance table.
Create the WalletTxHistory records from the AddressTxHistory.

After requesting the creation of the wallet, the client must polling to verify when it is ready. It may take a few seconds to be ready. If an error occurs, the user must have the choice to check status again or reset the wallet.

Arguments:

Pubkey
Name (optional)

Returns: { walletId, status }

GET `/wallets`: Get status of a wallet

Arguments:

WalletId

Returns: All fields of the wallet, including its status.

GET `/addresses`: Get wallet's addresses

We can run one query in the Address table filtering by a given walletId.

Open question: Should we paginate?

Returns:

List of addresses
- Address
- Number of transactions
- Path (e.g. m/1/1, m/1/2)

GET `/balance`: Get wallet's balance

Run one query in the Balance table filtering by walletId (and tokenId if the argument is passed).

Arguments:

TokenId (optional)

Returns:

If TokenId is received, it will return only the requested token.

Map of balances:
- Token { tokenId, tokenName, tokenSymbol }
- Total
- Locked
- Available

GET `/txhistory`: Get wallet's transaction history

Arguments:

TokenId
Skip
Count

Returns:

Skip
Count
List of transactions
- Format: { txId, timestamp, action, amount }
- Actions: sent, received, move

GET `/tx`: Get transaction's detail

Arguments:

TxId

Returns:

Tx information, including decoded inputs and outputs (the same as returned by the full node)

GET `/txproposals`: List of wallet's tx proposals

Arguments:

Status (all, open, closed)

Returns:

List of tx proposals according to the given status

POST `/txproposals`: Create a new tx proposal

When a tx proposal is created, it should mark the used UTXO as locked. They do it by assigning a TxProposalId to them. So, next tx proposal will skip them avoiding double spending. We can optionally create a scheduled job to cancel all tx proposal opened for more than X minutes.

Arguments:

Input selection algorithm (optional) (e.g. best match, from highest, from lowest)
Outputs [[addr1, amount1], [addr2, amount2], ...]
Comment

Returns:

Tx proposal
- Id
- Comment
- Raw tx (to be signed)

POST `/txproposals/update`: Update tx proposal

Arguments:

TxProposalId
List of input data [ inputData0, inputData1, ...] (optional)
List of parents (optional)
Nonce (optional)
Propagate bool (optional)

Returns:

Success or error message

POST `/txproposal/close`: Cancel tx proposal

When a tx proposal is closed, it should remove all locks from the UTXOs used by it.

Arguments:

TxProposalId

Returns:

Success or error message

obiyankenobi commented 4 years ago

It's not completely clear for me what's the flow of information for the most common use cases and specially what needs to be computed on the fly and what's just a matter of fetching from db. For instance, I imagine wallet balance will be calculated by fetching from the db all addresses belonging to a wallet and summing the balance for each, for each token. Is it easy to do on a nosql database?

2 other things we debated on meetings but are not here:

we'll temporarily remove inputs on a tx proposal from the available utxos? What are the conditions and when will we add them back? Is there a timer?
for how long should we keep information about a wallet, if it's not being used?

One other thing that's not clear is the wallet initialization process, when a user starts a wallet for the first time (or if we removed this wallet's data from our service). We need a process to go through all txs and build the info for this wallet?

pedroferreira1 commented 4 years ago

I think the design is good so far but the actions should have more details, e.g. I believe all the API's should have a short description of what would happen on it.

Some other things I think should change:

When creating a tx proposal we should mark an output as 'probably used' (we should add this on the database, so we don't select the same utxo to be used);
I believe we should have the walletId on the TxHistory (database), so we can get the history by walletId and not by address;
In the /txhistory GET endpoint we should have a pagination parameter.

msbrogli commented 4 years ago

I followed @pedroferreira1's idea and added a description of what each API Endpoint should do, including details of which tables would be used in each case.

I had to create two tables: AddressTxHistory and WalletTxHistory. Is there a simpler way to do it?

In the /txhistory GET endpoint we should have a pagination parameter.

@pedroferreira1, it has through the skip and count parameters.

msbrogli commented 4 years ago

@pedroferreira1 @obiyankenobi @jansegre Can you review it again, please?

msbrogli commented 4 years ago

@pedroferreira1 @obiyankenobi @jansegre Can you review it again, please?

jansegre commented 4 years ago

Looks good to me

obiyankenobi commented 4 years ago

Maintenance Tools For each table, we should have a verification tool that checks its consistency. This tool should also have a parameter to recalculate the records of a given table. These tools will be integrated in our monitoring and alert system.

I'm not sure I understand what this is.

Address (for all addresses, even if WalletId is null) WalletId (indexed, null) TokenId (indexed) Balance (indexed) Number of transactions UpdatedAt

Why is Balance indexed?

WalletTxHistory (for all transactions) TokenId (indexed) WalletId (indexed) TxId Amount (positive: received / negative: sent) Timestamp CreatedAt

A tx usually involves more than one wallet (the sender's and recipient's wallets). In this case, would we have one entry for each wallet for each tx?

The API can be consumed both by HTTP and Websocket.

What do you mean by being consumed by Websocket? Something like REST over websocket?

Wallet must received a balance update message every new tx, so it can update its balance and tx history.

Is this the only message sent on the websocket? This would be a combination of the balance update + new transaction?

All requests must be signed by m/1/1 of the wallet's xpriv.

What are we signing? A string like the wallet id? Also, where will this authentication be sent on every request? I think the http header is a good place.

GET /wallets: Get status of a wallet

I suggest /wallet.

List of transactions Format: { txId, timestamp, action, amount } Actions: sent, received, move

What's move?

POST /txproposals: Create a new tx proposal Arguments: Input selection algorithm (optional) (e.g. best match, from highest, from lowest) Outputs [[addr1, amount1], [addr2, amount2], ...] Comment

What's the comment argument?

How would the wallet send a transaction? Also through the wallet service?

jansegre commented 4 years ago

I missed this:

All requests must be signed by m/1/1 of the wallet's xpriv.

I haven't thought much about it, but this feels unusual. Does the BWS do this? I have two concerns:

The serialization specification we will have to make for signature to be consistent (signing a JSON can became a headache if we have protocol proxies (AWS API Gateway?) that may parse the JSON and end up reordering/reformatting stuff);
How this will play along with hardware wallets, which I'd guess wouldn't be happy signing stuff away without making the user very aware of what is being signed, which would make this impractical to use if the user has to approve every single request (besides having to understand them too) or having the wallet sign any such message for a period of time, which could be a security concern, besides the wallet having to understand the messages to make sure it isn't signing stuff that must require user consent.

obiyankenobi commented 4 years ago

Yeah, what is being signed should be discussed. I had mentioned signing a string like te wallet it, but I guess it doesn't make sense to have a fixed data being signed. Besides, GET requests don't have bodies, so there's no data to sign there.

As for the second point @jansegre , I'm not sure it's a problem. I don't think the hardware wallet needs to display every data that requires signing. For instance, in our Ledger implementation, we don't display the change output. The user just needs to trust that the wallet is not stealing his money, so the user has to trust the code running on the hardware wallet.

obiyankenobi commented 4 years ago

Events from Full Node New tx or block. Update tx or block.

One other thing I was thinking: we'll have to update the full node also, right? It'll have to send this events on AWS? It needs the correct AWS credentials, I suppose.

pedroferreira1 commented 4 years ago

I had to create two tables: AddressTxHistory and WalletTxHistory. Is there a simpler way to do it?

I don't see where AddressTxHistory is used. I think only the WalletTxHistory solves our problem, isn't it? Or we will use AddressTxHistory to get the address history (for search address on the explorer) and WalletTxHistory to get the wallet history?

One other thing I was thinking: we'll have to update the full node also, right? It'll have to send this events on AWS? It needs the correct AWS credentials, I suppose.

I think you are right, if we use the SNS for this. We could also send this message on the ws from the full node.

What's move?

I believe it's when we transfer in the same wallet, just moving from an address to another.

A tx usually involves more than one wallet (the sender's and recipient's wallets). In this case, would we have one entry for each wallet for each tx?

Yes, I think we will have one for each. We need this to get the history for each wallet.

msbrogli commented 4 years ago

Maintenance Tools For each table, we should have a verification tool that checks its consistency. This tool should also have a parameter to recalculate the records of a given table. These tools will be integrated in our monitoring and alert system.

I'm not sure I understand what this is.

We have cached pieces of information in multiple tables. For instance, the Balance.balance is the sum of the Address.balance, which is the sum of the UTXOs. Imagine that we find a bug and some of them have not been correctly updated. So, we need some tools to recalculate everything for a given walletId, for instance.

Address (for all addresses, even if WalletId is null) WalletId (indexed, null) TokenId (indexed) Balance (indexed) Number of transactions UpdatedAt

Why is Balance indexed?

We can sort by Balance to know the richest addresses list.

WalletTxHistory (for all transactions) TokenId (indexed) WalletId (indexed) TxId Amount (positive: received / negative: sent) Timestamp CreatedAt

A tx usually involves more than one wallet (the sender's and recipient's wallets). In this case, would we have one entry for each wallet for each tx?

Exactly, because the amount would be different. I'll clarify it on the design.

The API can be consumed both by HTTP and Websocket.

What do you mean by being consumed by Websocket? Something like REST over websocket?

Exactly. There must a way to call these API endpoints from websocket. I can look for a protocol to do that over websocket.

Wallet must received a balance update message every new tx, so it can update its balance and tx history.

Is this the only message sent on the websocket? This would be a combination of the balance update + new transaction?

It is not the only message. I'll improve the text.

All requests must be signed by m/1/1 of the wallet's xpriv.

What are we signing? A string like the wallet id? Also, where will this authentication be sent on every request? I think the http header is a good place.

I'll improve the text.

GET /wallets: Get status of a wallet

I suggest /wallet.

Ok! Thanks!

List of transactions Format: { txId, timestamp, action, amount } Actions: sent, received, move

What's move?

It is the name used by BWS when you send a tx to yourself.

POST /txproposals: Create a new tx proposal Arguments: Input selection algorithm (optional) (e.g. best match, from highest, from lowest) Outputs [[addr1, amount1], [addr2, amount2], ...] Comment

What's the comment argument?

A free text field. It can be used to identify the txproposal.

How would the wallet send a transaction? Also through the wallet service?

Yes, there's a propagate choice in the update txproposal API.

msbrogli commented 4 years ago

Events from Full Node New tx or block. Update tx or block.

One other thing I was thinking: we'll have to update the full node also, right? It'll have to send this events on AWS? It needs the correct AWS credentials, I suppose.

Yes, we will have to update the full node. There are two ways to go here: (i) full node sends the notifications to SNS, or (ii) full node have an external notification system (e.g. zeromq), and we will develop a piece to replicate the notifications into SNS.

msbrogli commented 4 years ago

I missed this:

All requests must be signed by m/1/1 of the wallet's xpriv.

I haven't thought much about it, but this feels unusual. Does the BWS do this? I have two concerns:

Yes, it does.

The serialization specification we will have to make for signature to be consistent (signing a JSON can became a headache if we have protocol proxies (AWS API Gateway?) that may parse the JSON and end up reordering/reformatting stuff);

They use a RFC that describes how to sign any http request.

How this will play along with hardware wallets, which I'd guess wouldn't be happy signing stuff away without making the user very aware of what is being signed, which would make this impractical to use if the user has to approve every single request (besides having to understand them too) or having the wallet sign any such message for a period of time, which could be a security concern, besides the wallet having to understand the messages to make sure it isn't signing stuff that must require user consent.

I think you're right. I'll give more thought to it.

obiyankenobi commented 4 years ago

Exactly. There must a way to call these API endpoints from websocket. I can look for a protocol to do that over websocket.

What's the goal of having the endpoints also available through websocket?

msbrogli commented 4 years ago

I had to create two tables: AddressTxHistory and WalletTxHistory. Is there a simpler way to do it?

I don't see where AddressTxHistory is used. I think only the WalletTxHistory solves our problem, isn't it? Or we will use AddressTxHistory to get the address history (for search address on the explorer) and WalletTxHistory to get the wallet history?

It is there just for auditing. If we find a bug we can (easily) recalculate the values of other tables.

msbrogli commented 4 years ago

Exactly. There must a way to call these API endpoints from websocket. I can look for a protocol to do that over websocket.

What's the goal of having the endpoints also available through websocket?

I was replying when I noticed that it may not be necessary. The idea is that we will use a websocket to get notifications, so we can update the wallet when something changes. In this case, we would receive a notification and send an HTTP Request to get the new piece of information, right? My original idea is that we could authenticate after the websocket connection is open and then use the APIs without any authentication. I'll put this flow in the text, so I'll need to have a clearer understanding of it.

obiyankenobi commented 4 years ago

The idea is that we will use a websocket to get notifications, so we can update the wallet when something changes. In this case, we would receive a notification and send an HTTP Request to get the new piece of information, right?

I'm not sure about this. The way I see it, it's like our wallet nowadays. We do the initial http requests and then we basically receive the updates through websocket, so we don't need to do other http requests.

obiyankenobi commented 4 years ago

About using a NoSQL db, I'm not sure it's the best approach. Our data seems to be structured in a way that we could use SQL dbs. We know all fields already.

Also, there seems to be some limitations in DynamoDB for primary keys. It also needs a unique primary key, that can be of 2 formats:

simple: one of the columns/attributes is the key;
composite: there are 2 attributes as key. Similar to using (column1, column2) as key in a SQL database;

(Obs: attributes are the equivalent of SQL columns in DynamoDB)

More info here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey

Consider table WalletTxHistory. It has the following attributes/columns:

TokenId (indexed)
WalletId (indexed)
TxId
Amount (positive: received / negative: sent)
Timestamp
CreatedAt

The primary key needs to be unique. We can't use just (WalletId, TokenId), as several transactions in the same wallet can have the same TokenId. We have to use (WalletId, TokenId, TxId).

This is not supported by DynamoDB, as we can only use 2. There's a way around it, though. We can concatenate TokenId + WalletId, as we'll always query for the history of a wallet's given token. That way, our primary key would be ("WalletId-TokenId", TxId). Doing this concatenation here is not a problem as we'll always query for a WalletId AND TokenId, so we can easily construct it during query time.

Even though it's a hack, we can make it work. It's not so easy to do the same for table AddressTxHistory. I think it's missing attribute/column TokenId in the original design, so I added it here.

Address
TokenId
TxId
Amount (positive: received / negative: sent)
Timestamp
CreatedAt

Primary key has to be composed by (Address, TokenId, TxId). We cannot use just (Address, TxId), as one tx can have multiple tokens to/from the same address. We need to be able to query this table by Address, so we could have our key as (Address, "TokenId-TxId"). It works, but when fetching the results we'd need to separate TokenId and TxId again from "TokenId-TxId". Or we duplicate the attributes and have everything in the table ("TokenId-TxId", TokenId, TxId).

EDIT: Now that I've written it and have come up with ways to go around the "limitations", it doesn't seem so bad. Even Amazon recommends concatenating information when creating keys ("country-state-city" in this example). Maybe it just feels like a hack because I'm used to the SQL world.

I guess the main discussion we should have is wether there's an actual advantage in using NoSQL. Our data seems to be structured, so SQL could handle it. The advantage DynamoDB seems to be related to the pricing model, as we're only charged for read/writes.

jansegre commented 4 years ago

There are other good reasons to choose SQL:

Less AWS lock-in: one less AWS-only service to worry if we ever have to move out
Familiarity: I'm pretty sure most of us are very familiar with SQL (MySQL, and maybe PostgreSQL), but DynamoDB is a new one for me (I don't think it is similar enough to transfer knowledge from MongoDB, which even then I'm less familiar than SQL in case we want to debug/optimize)

IOW, I'm very pro using SQL (on whatever works best with RDS, maybe Aurora)

pedroferreira1 commented 4 years ago

I was reading a bit to try to understand why BWS uses nosql but I couldn't find a reason behind it. I agree that we currently have no clear reason to use nosql and should go with SQL.

I just would like to know if @msbrogli has read anything that could explain their choice.

pedroferreira1 commented 4 years ago

I'm not sure about this. The way I see it, it's like our wallet nowadays. We do the initial http requests and then we basically receive the updates through websocket, so we don't need to do other http requests.

I like this approach but we just need to make it clear what to do when we lose connection on the websocket. The wallet reloads the full data, should we do the same here? Process all the information again?

What's the safest approach here?

obiyankenobi commented 4 years ago

Regarding the events that the full node will have to emit, we might want a different approach than what we currently have.

Right now we send an event (on the websocket) with the new tx and other events with the txs on the inputs, which are spent. We could send just one event with the tx and all input txs so it could all be processed at once.

This could prevent that small "bug" with the balance, in which it usually increases and then decreases when sending txs to the same wallet. Also, it could be better to invoke just one lambda function per new tx than one for every new tx and its inputs.

obiyankenobi commented 4 years ago

As I'm implementing, this last comment of mine doesn't seem relevant. At least for new transactions, we have all information we need, as the inputs and outputs have a decoded field, with all the information necessary.

obiyankenobi commented 4 years ago

There's one other thing that I'm in doubt during the implementation. When a new transaction arrives, how do we know if it belongs to an already started wallet?

Imagine a have a started wallet (walletId = myWalletId) with add1 and addr2. For now, I only received1 transaction on addr1 with 22 HTR. So the Addresses table will have:

| Address | WalletId   | TokenId | Balance | # of txs |
|---------|------------|---------|---------|----------|
| addr1   | myWalletId | 00      | 22      | 1

A new transaction arrives that sends some tokens to addr2. How can I associate this address with myWalletId?

At first, I thought that, during wallet creation, we'd also add the unused addresses to this Address table. However, as it also includes the token id, it seems strange for me to do that. We'd add the token id as null:

| Address | WalletId   | TokenId | Balance | # of txs |
|---------|------------|---------|---------|----------|
| addr2   | myWalletId | NULL    | 0       | 0        |

Maybe there's something I'm missing.

msbrogli commented 4 years ago

@obiyankenobi We must add the unused addresses to the Address table as well. In this case, it would be a placeholder that must be replaced when a real transaction arrives. As for the token id, I'd use the HTR token id instead of null.

obiyankenobi commented 4 years ago

@obiyankenobi We must add the unused addresses to the Address table as well. In this case, it would be a placeholder that must be replaced when a real transaction arrives. As for the token id, I'd use the HTR token id instead of null.

What's the reason we need the token id and balance on the Address table? Get the richest addresses list?

msbrogli commented 4 years ago

@obiyankenobi We must add the unused addresses to the Address table as well. In this case, it would be a placeholder that must be replaced when a real transaction arrives. As for the token id, I'd use the HTR token id instead of null.

What's the reason we need the token id and balance on the Address table? Get the richest addresses list?

I think there are several uses:

1) Richest list, as you said. 2) Debug the wallet's balance if it is incorrect for any reason. After all, the wallet's balance is equal to the sum of its addresses. 3) Show the list of addresses to the user (this is currently available only in Hathor Desktop Wallet).

obiyankenobi commented 4 years ago

One thing about the Address table and GET /addresses API.

On the table, the addresses are separated by tokenId. Imagine we had 2 transactions in total. The first just had (among the outputs) addr1 and token1. The second had addr1, token1 and addr1,token2. The table will look like this (I omitted some columns):

| Address | TokenId | # of txs | WalletId |
|---------|---------|----------|----------|
| addr1   | token1  | 2        | id1      |
| addr1   | token2  | 1        | id1      |

When we fetch the addresses for wallet with id1, how would we calculate the # of transactions for addr1? It's definitely not the sum. We could get this information from AddressTxHistory table, maybe during the API call itself (not have it pre-computed). But # of transactions on the Address table seems useless.

One other thing we might have is both an Address and AddressBalance tables. AddressBalance

Address
TokenId
Balance

Address

Address
Path
WalletId
# of txs

obiyankenobi commented 4 years ago

I'll go with the suggestion I gave on the last comment for addresses.

We need to store # of txs for each address in order to know if it's been used or not. We cannot use just the balance, as the address might've been used but happens to have balance 0.

pedroferreira1 commented 4 years ago

# of txs is also used on the explorer, when we show the # of txs for the address for each token.

One thing I noticed from the design is that it's missing an address on the Address table. Probably you've already added it.

And I agree the best approach would be to get the tx count from the AddressTxHistory table during the API call.

obiyankenobi commented 4 years ago

And I agree the best approach would be to get the tx count from the AddressTxHistory table during the API call.

I'll actually do as I suggested, so there should be a field on the database for the # of txs and we won't need to calculate it.

The reason for this is that we need a way to know if an address has been used or not. Just having the balance doesn't help, as the address might have been used but happens to have balance 0.

pedroferreira1 commented 4 years ago

The reason for this is that we need a way to know if an address has been used or not. Just having the balance doesn't help, as the address might have been used but happens to have balance 0.

This could be easily done checking if there is any data on AddressTxHistory for this address.

But it's okay, I am good to go with your solution. You just need to add # of txs field also on the AddressBalance table for the reason I mentioned if you go with it.

obiyankenobi commented 4 years ago

This could be easily done checking if there is any data on AddressTxHistory for this address.

That's true. But I think I'll move with 2 separate tables to have it pre-computed (and can also store the address derivation path on the Address table).

You just need to add # of txs field also on the AddressBalance table for the reason I mentioned if you go with it.

Makes sense, I hadn't thought of that. Thanks.

obiyankenobi commented 4 years ago

Ok, so for the locked/unlocked balances, this is what I thought.

The AddressBalance table would require locked/unlocked balances:

| address_balance   |
|-------------------|
| address           |
| token_id          |
| locked_balance    |
| unlocked_balance  |
| n_of_transactions |

The same for WalletBalance:

| wallet_balance    |
|-------------------|
| wallet_id         |
| token_id          |
| locked_balance    |
| unlocked_balance  |
| n_of_transactions |

The Utxo table already has a timelock column:

| utxo     |
|----------|
| tx_id    |
| index    |
| token_id |
| value    |
| timelock |

I see 2 possibilities here. We can:

use this timelock column to indicate if it's locked. If the timelock is already past the present time, the column will be set to null. We don't preserve the information, but after the tokens are unlocked I'm not sure it's worth having it;
add an extra locked column that will indicate whether the utxo is locked or not. So if an utxo has a timelock in the past, the timelock column will still be set but the locked column will be null (or false).

I believe there are 2 "external" factors we should also consider: block height lock and tx-proposals. Both lock a tx, but not with a specific timestamp. Using idea 1, we could set the timelock column to MAX_INT when a block is not yet unlocked or an utxo is part of tx-proposal. This way they'll be considered locked. Later, when they are unlocked, it should be set to null.

To be honest, I haven't seen any advantages or disadvantages in either. Re-using the timelock column should do the trick, but adding an extra column preserves the utxo timelock information (not useful now, but might be useful in the future, even though it's easy to get it directly from the tx as we have the tx_id and index).

What do you think?

obiyankenobi commented 4 years ago

One other possibility is doing like proposal 1 above, but never deleting the timelock info from the table. When we want to create a tx proposal, we query the table and filter by timelock <= some_date. This way we get only those that are unlocked.

For blocks, since they are unlocked by height, we could set the timelock to MAX_INT and later remove it.

jansegre commented 4 years ago

I wonder how useful is the timelock information, I'd guess it is somewhat useful so I'd say proposal 1 with timelock <= some_date on queries seems the best of both worlds (although performance-wise, it really isn't obvious which is best, and I think this is fine). As for blocks, couldn't we just add a heightlock and use heightlock <= some_height on queries? These even work well with NOT_NULL because a value of 0 can be used on both cases to indicate no lock. I don't really see the problem of adding a column.

obiyankenobi commented 4 years ago

As for blocks, couldn't we just add a heightlock and use heightlock <= some_height on queries? These even work well with NOT_NULL because a value of 0 can be used on both cases to indicate no lock. I don't really see the problem of adding a column.

I like this idea. We'd also always keep the information in the tables (no need to set to null), which may be even better when dealing with conflicts, as the best block height might change. I'll follow it.

obiyankenobi commented 4 years ago

Did some research on how we could deal with timelocks. It's a bit more complex than heightlock. The latter is easy to handle because a new block unlocks every time a new one arrives. We don't need to schedule anything.

For timelocks, these are some suggestions:

Check for new unlocked transactions every time a block arrives. It should be around 30s interval, so seems like a precision we can tolerate, even though it can take up to a few minutes. This the simplest solution;
We can schedule events. They enable us to run lambda functions at a set date/time. It's a pretty straight-forward. However, we can only schedule events with minute precision, so doesn't seem to have much advantage over the first solution;
AWS SQS allows us to set a timer for the message, so it's only delivered after that time. We can have an unlock message sent to the queue with the timer expiring at the exact moment the utxo should be unlocked (I haven't been able to find if this timer is 100% precise or a best effort). It's not very simple because the maximum delay time is 15 minutes. We'd have to run a lambda every 15 minutes (using schedule events) that sends messages with all utxos being unlocked in the next 15 minutes, using the correct delay. There is also a corner case where a new tx arrives with a timelock due to less than 15 min and we need to handle, but the basic idea is the same. This is the most complex idea, but the one that guarantees txs are unlocked precisely.
We could also do a combination of 2 and 3. Instead of running a lambda every 15 minutes, schedule it when we get an utxo with timelock (we should round to the minute) and then send the delayed message. So if we get an utxo that unlocks at 01:35:40pm, we schedule a function to run at 01:35pm and it sends a message with 40s delay, which should run at the precise unlock time. Also a bit complex.

What do you think? I'm inclined to go with 1. Seems like a reasonable precision.

jansegre commented 4 years ago

The problem is not entirely clear to me yet. Let me see if I understand.

These affects at most 3 tables address_balance, wallet_balance and utxo, so I'll only focus on them:

It seems like utxo can be simplified a lot by using timelock and heightlock that way row operations should basically be insert and delete, and those are made only made for every block update, which seems great.
Now address_balance and wallet_balance have the columns locked_balance and unlocked_balance which potentially depend on both the current time and the current block height. Which is what we're trying to deal with.

Ideally locked_balance and unlocked_balance would behave "calculated fields" (like properties in python), but that is inefficient, that's probably the reason these fields exist in the first place. So they are basically a cache.

All the proposals 1 to 4, focus on how to update every one of these cached values as reasonably soon as possible. But I don't know if this is necessarily great. I'll add a proposal 5. A "lazy evaluation", add locks_expire_height and locks_expire_timestamp columns on both address_balance and wallet_balance, and when only when querying wallet_balance, if the balance is invalid (because then querying we have the latest timestamp and height) we update the locked/unlocked balances of the wallet and its addresses. I don't imagine this being more complex than configuring a trigger system to update all balances, but please do correct me on this one.

If we have to go with one of these proposals, I'd go with 1. because it doesn't seem to need any extra configuration, we already have a stream of block events. But the trade offs I see are: a) having a slightly outdated balance lock (~30s) vs having to wait for a balance update query (probably a fast query); b) one query to update all balances, even when they are not being used vs multiple queries that only update the balances there are being requested.

obiyankenobi commented 4 years ago

It all depends on how much processing we wan't to do when there's an api call. Let's use your proposal and imagine someone queries the wallet balance, which has a locks_expire_timestamp set. From just that information, we don't know which utxo is now unlocked. We'd have to find out which addresses belong to this wallet and then query the utxo table to find out the unlocked utxo(s) (or just a sql join). Anyway, seems like the kind of thing we were trying to avoid by calculating everything beforehand.

@jansegre not sure I got your idea right or I'm not seeing an easier way to implement it.

jansegre commented 4 years ago

Wait, shouldn't UTXOs have an address field? It would be possible if tx_id and index was enough to find the address, but I think it would be useful to have the address on the utxo table. I know it's one more field, but this seems like a useful one, no? The rest I assume is there already (there should be a way to find all UTXOs that make up a wallet through tx_id and a transaction-wallet table (which I imagine we have, because we have to list all transactions of a wallet). Or if we add a utxo_wallet or addrerss_wallet index pair table.

obiyankenobi commented 4 years ago

We do have the address on the utxo table. As I said, we'd have to query:

We'd have to find out which addresses belong to this wallet and then query the utxo table to find out the unlocked utxo(s) (or just a sql join).

But I think this design tries to prevent as much as possible these on-demand calculations and prefers to have pre-computed values.

Also, we might have more than one utxo with timelock. In that case maybe we'd need a list of locks_expire_timestamp. To be honest, I'm still leaning a bit more towards the other solutions.

pedroferreira1 commented 4 years ago

I think @jansegre's idea seems better, it doesn't add much more processing (even though it adds more complexity to the code). Updating the address_balance is easy and if we add a wallet-id column to the utxo table it's also easy to update wallet_balance balance.

The good thing is that we will always have the correct balance, even if there is a bug in our mining and we end up having some hours without blocks.

The bad thing is that we need to do this check (see if there ia a balance to be updated) every time someone requests the balance (on the wallet, or on the explorer to get an address balance, or the richest addresses). However this check would be needed every time a block arrives in the first option, so I don't see that as something really bad.

jansegre commented 4 years ago

We do have the address on the utxo table. As I said, we'd have to query:

We'd have to find out which addresses belong to this wallet and then query the utxo table to find out the unlocked utxo(s) (or just a sql join).

I just read DATABASE.md and it looks like we do have everything we need. I misunderstood that we needed more info to make this query possible. And it seems to me that the whole update could be made with one or two queries, which admittedly aren't trivial, but probably won't have more than 2 joins each, and would have to visit all utxos of the addresses needing an update. Which even in the range of 10k seem to me like it should be a nearly instant query (at least with the right indexes) for MySQL.

Then again, similarly could be said for a query that updates all expired locks globally, but that would scale much worse, for the benefit of always having a balance at hand that will be outdated sometimes. If we accept that the balance can be outdated sometimes, we could return the current expired balance before waiting for the update query to end.

But I think this design tries to prevent as much as possible these on-demand calculations and prefers to have pre-computed values.

I agree with the sentiment, but I don't fully agree "as much as possible", there are other trade-offs to consider. This query would most likely have to be run the first time a miner (assuming a miner because it's the most common wallet user with locks) opens their wallet and then every N seconds (in average), N being how long it takes for them to find a block (because that's how far apart heightlocks will usually be). And since I'm assuming this query won't take more than a few hundred milliseconds (if even that), it's a very fair price to pay for scaling better, always returning the correct up to date balance, and having a slightly less complicated deploy.

An orthogonal idea (that can probably be used along whatever we decide to do with locks and leans more towards pre-computed values) would be pre-computing a list of future "(unlocked, locked)" balances, that approximates (because it would have to somewhat synchronize heightlock+timelock) when the next unlocks will be and pre-computes the next N (for some N, or even until there aren't any locks, which shouldn't be a too high of a count) balances (and in the background update the forecasts and delete the past balances). When we receive a query we just use the balance that is valid at the timestamp the query was made. This could be a relevant if whatever solution we go for ends up having a bad performance.

Also, we might have more than one utxo with timelock. In that case maybe we'd need a list of locks_expire_timestamp.

That's what I imagined, and I think the same query would update all the balances needed. What I keep thinking is "how hard can it be to make this query?". I may very well be underestimating this, but if it's mostly matter of "this will make the wallet slow", I think we could make this query and benchmark it a little (or against updating all addresses, considering the UTXOs already on the mainnet) to have an idea of the performance impacts.

obiyankenobi commented 4 years ago

Just 2 things about this idea:

I'm not sure it's worth doing this for heightlock. We can already update it when a block arrives, because that's when the old block becomes unlocked. It's a pretty simple operation, so I don't feel there's any advantage in doing it differently. For timelock it's a different thing;
if we decide to go this route, it's important to also discuss the sync strategy between wallets and this service. For instance, the current wallet only queries information from the full node when it starts and then listen for the transactions on websocket. In this case, there would be no new queries to trigger this balance update. So if I open a wallet one minute before a timelock expires, I wouldn't make another query later and get the new unlocked balance;

jansegre commented 4 years ago

2. if we decide to go this route, it's important to also discuss the sync strategy between wallets and this service. For instance, the current wallet only queries information from the full node when it starts and then listen for the transactions on websocket. In this case, there would be no new queries to trigger this balance update. So if I open a wallet one minute before a timelock expires, I wouldn't make another query later and get the new unlocked balance;

To me this is actually the best argument against it so far, and probably tips the balance in favor of your initial proposals (of some form of event that would trigger the update).

The only alternative I see that could possibly solve this without relying on time based events, is to move away from a simple (locked, unlocked) balance structure to something like [(0, <full balance>), (-1, <balance locked from height or "permanently locked"> (<ts1>, <balance unlocked after ts1>),(<ts2>, <balance unlocked after ts2>), ...], which the UI can use for exposing a (unlocked, locked) balance or even display when the next amount would be unlocked, but this is obviously more complicated and I'm not sure I like it that much.

I think we could review the balance structure when we include support for multisig. I suspect some sort of improvement of this structure might be inevitable. Also we should consider researching how BWS does this (and possibly other wallets too), it's probably a problem that has been solved many times in many different ways.

For now, I'm leaning into the proposal 1., piggy-backing on block events to get these "time-events" that would signal a possible unlock event.

msbrogli commented 4 years ago

I think that @jansegre's idea is better and simpler. To reduce some of the downsides pointed out by @obiyankenobi, we can run once a "general utxo update" task once a day (twice a day?) that will go through all utxos that have timelock <= now (we need to properly index the db) and update their wallets.

Regarding the sync between wallet and the service, I don't like @jansegre's idea of storing multiples balances according to the timestamp. We can simply include a next_refresh field that will be equal to the minimum timelock of the utxos of that wallet (or null). So, even if no transaction has been received for a while, the wallet will trigger the update. What do you think?

obiyankenobi commented 4 years ago

We can simply include a next_refresh field that will be equal to the minimum timelock of the utxos of that wallet (or null). So, even if no transaction has been received for a while, the wallet will trigger the update.

We'd have 1 next_refresh value per wallet? How does the wallet get this value, in case it's been changed? We again fall on the case of having to debate the sync between wallet and this service, as this value needs to be updated.

jansegre commented 4 years ago

We'd have 1 next_refresh value per wallet? How does the wallet get this value, in case it's been changed? We again fall on the case of having to debate the sync between wallet and this service, as this value needs to be updated.

I think an easier way to see this is looking at the balance returned by the API as the following structure (which doesn't have to be the structure we use, but it would be functionally identical apart from a refactor):

balance: {
  available: X,
  locked: Y,
  expires: T,
}

The wallet is then able to display the balance until the time reaches expires, when it will have to make a "refresh" query to get the updated balance (including the new expires).

This value represents the earliest timelock to expire, and it can't change unless a new tx/block for this wallet arrives, and then this is an event that should trigger the same update mechanism that a schedule would (which I assume is a solved problem, how to make this update arrive to the frontend).

The more I think about it, the more I get that this is just the functionality relevant <0.1% of uses that requires >30% of the design effort. By no means I think we should ignore this, but we definitely can postpone this and let time-locked balances be outdated sometimes, no?

HathorNetwork / hathor-wallet-service-old