Store already confirmed txs and historical data locally

nop33 commented 1 year ago

We currently have 2 features that (potentially) require the knowledge of the whole transaction history for all addresses of the wallet in order to work correctly. Those are:

Historical data on price: https://github.com/alephium/desktop-wallet/issues/587
Future unlock schedule: https://github.com/alephium/alephium-frontend/issues/44

Wouldn't it make sense that the wallet stores locally in an encrypted file(s) all address transaction data that has been recorded on the blockchain and there is no way it gets changed?

Ideas:

Only store transactions that have X confirmation (or are older than 24h or sth)
On wallet start-up, fetch all new transactions and update store

Benefits:

Fewer requests to the explorer backend service
Better UX since transaction pagination will not need to reach to a server (we could implement infinite scrolling)
Can implement feature 1
Can implement feature 2

@mvaivre, @tdroxler, what are your thoughts on this?

Some quick math: An average tx with 2 inputs, 4 outputs, some ALPH and some tokens takes about ~2'500 characters (with a simple compression it can get down to 1'200 characters) so that's about 1KB per tx.

Open questions:

How many txs does the larger ever Alephium wallet has?
Is the performance of the electon app get affected when the Redux state grows a lot? (this is not inherently a potential problem of this proposal, a user can already click "See more" as many times as they want to load all their history)
What other alternative proposals are there for implementing feature 1 and 2?

tdroxler commented 1 year ago

Here is our top addresses in term of nb of txs:

  count  |                   address
---------+----------------------------------------------
 1659997 | 1Gzp3Wc42fanrbAqknYmSaZLfC567bP3JWprDCiuJ8rbP
 1538521 | 19WzSnmNC1SQ6v7RpFFXhpcMcFSiwM4nKTSdbwgSJfSHy
 1538456 | 14kSoi8pFMn2wMJFZJgGpFSA7GyNMtwWKnsKc1KidZ8Bu
 1538155 | 12jK2jHyyJTJyuRMRya7QJSojgVnb5yh4HVzNNw6BTBDF
  680538 | 1BU6eg4bnn4gFgpSx6U96xqwZxyJHwEdMajCE3wminc3e
  575524 | 134RbnQXBWFCnR4HhmEgFjkotkdmNzNeYqhQ84ZKjHnjN
  575315 | 134dmgh9AZNimp9ZAYpWBTwv3ddouETBTyq6134mBvVk2
  575311 | 12WhBVry3PXzTyCm389eSZMVsDSQZjGh5csv6CPrNfgtz
  575148 | 1DKWCEc9HjhchKoMCMA4J1nbNGNjrAoPFnkcgA1w6h7jh
  346319 | 17R6Ptkz9i1LhiKyMhnitUMkgFygGeeQUFZvRx6GgV8Fc
  111622 | 1EW4DKjCvXWEbipBNFxf1N3UDhdYmqX7NBmjjizvd9j74

mvaivre commented 1 year ago

That would make a lot of sense IMO. However, according to your 1kb estimation, largest addresses would need to store more than 1GB of data (!).

We won't be able to use the local storage for this, as the upper size limit for one entry seems to be 5mb. We should look into Electron's safeStorage or other methods.
Loading 1Gb in memory (Redux) will certainly impact performances on small machines. We would need to think about ways to stream data instead of loading everything in one go.

What other alternative proposals are there for implementing feature 1 and 2?

@tdroxler is investigating what endpoints (and new query params) could help us with this! Thomas, if you could update this discussion with your findings, that would be awesome, thanks a lot! 🥇

tdroxler commented 1 year ago

I did a first investigation and I can tell it won't be trivial to have an efficient way of having the price chart, especially with the locks coins. I can't propose anything yet :/

mvaivre commented 1 year ago

That's unfortunate, but thanks for investigating @tdroxler. Let's keep working on that, as I think this feature is really useful. Would you be able to estimate how much time we would need to get to a first working solution?

@nop33 I'm working on an alternative design without the chart (Figma) in case we don't get to ship this in 2.0. My design is still WIP for now. Shouldn't be hard to implement, so no need to worry :)

nop33 commented 1 year ago

Thanks for the updates @tdroxler and @mvaivre.

@mvaivre I see your Figma changes, as always, it looks nice :)

However! Looking at the chart designs, I see that the time-range options that you have added are 1d, 1w and 1m.

Since 1 month is the longest range, I think it's not a big problem to request ALL transactions from the last month, so that we can create the chart values on the frontend side, together with historical prices from Coingecko. I think we only start having problems when users would want to see > 1y long. But since this is not an option in the current designs, I think we don't need the backend yet. WDYT?

mvaivre commented 1 year ago

This could be fine for most users, indeed! However:

Miners may have a lot of TXs in 1 months.
Eventually we should also propose 2 more options IMO: a "1 year" option, and "All". (even if it's not in my designs yet). For this to work, the future endpoint should provide us with a lower granularity as the time range increases.
In any case, we should probably cache previously fetched data locally (no need to fetch the data more than once a day).

tdroxler commented 1 year ago

Yes I think anyway we'll need a performant endpoint at some point. I really can't give an estimate right now, coming back to you soon hopefully.

tdroxler commented 1 year ago

Some update, for each time interval (currently one per day in my experiment) we compute the amount diff for that time interval. A bit the same way we show the +/-xxxxALPH in the front-end. Here are the two csv file from those two addresses: https://explorer.alephium.org/addresses/12RKPbAg4YSttqFBwZZra2QDbhtqtRBqpS8JZoMtRRAtU https://explorer.alephium.org/addresses/1Gzp3Wc42fanrbAqknYmSaZLfC567bP3JWprDCiuJ8rbP The second one being our biggest miner with millions of tx, the file is only 16Kb. Getting that latest takes around 1min, since we have to go through the millions of txs. But maybe that exactly that kind of data that could be stored locally, 16Kb for more than a year of data is fine I guess? As we know the time interval, we can have only 1 time value in the file, no need for both from and to. This reduce quite a lot the file size 12R.csv 1Gz.csv

mvaivre commented 1 year ago

Ooh that's promising! Super light indeed, this could be stored locally with no prob. cc/ @nop33 what do you think?

nop33 commented 1 year ago

Yes, I agree! Let's see how we can extract the information we need from it now.

mvaivre commented 1 year ago

@tdroxler Little ping 😊 If we're going for the solution explained above, when do you think we could have an endpoint ready for primetime? We could start really simple, with no params for granularity (D, M, Y). We could start by showing the last 1 year per default, with a delta value per week?

polarker commented 1 year ago

Hey all. Later to the party. While I think this is a great UX feature, what would be the performance impact on the explorer backend side by this? Will it scale in the future if we will have many users?

Sorry that I don't have the time to dig into the details.

polarker commented 1 year ago

Backend would be mainly limited by the following factors:

A normal SQL server usually can serve up to thousands of queries per seconds (very rough estimation).
We have some very heavy API queries (history data is one of them). Some of them can take seconds to minutes

We will gradually refine our API rate limit to make sure that each host will not use significant resources of backend service. This would impact UX as you have seen when we introduced the naive rate limit a while ago.

If possible, let's make sure our new features can work well in the long term, so we don't need to refactor/remove features in the future.

mvaivre commented 1 year ago

Thanks @polarker for your comments! I'll let @tdroxler chime in to share a more detailed answer regarding the explorer backend's side.

On our side, we made sure to call the history endpoint as little as possible in order to reduce the load to a minimum. For now, it's only called once per session, when the app is started. If this needs to be reduced further, we could store a timestamp and call the endpoint maximum once a day. Also, we don't let the user refresh the history chart manually.

Do you think that this could still impact the explorer backend perfs?

polarker commented 1 year ago

I will wait for @tdroxler's assessment too before delving into the details. In my assessment of new features for the explorer backend, I usually use the following heuristics:

Check if Etherscan APIs support similar features. As a commercial product with funding, if Etherscan cannot support a feature, it is better to avoid supporting it unless there are compelling reasons.
Check if the feature is used in other wallets, particularly those with limited resources. If our explorer backend does not become a commercial project in the future, we will have limited resources to maintain it. For example, Argent has a very conservative API usage approach, which sometimes results in poor UX. But it's fun to play UX with limited resources :)

These heuristics are not set in stone and can be challenged if necessary.

tdroxler commented 1 year ago

From etherscan api we see that they have a lot of historical endpoints, I haven't found the one for the historical price of a user, but they do propose it in their UI, analytics tab in an address page:

Screenshot 2023-04-25 at 07-17-47 teacherkiat eth Address 0x120016b6da9b03164b09d945346a847470bb9bee Etherscan

I haven't used many wallets, I know Ledger is proposing it. Maybe @mvaivre and @nop33 who have tried many of them could give a feedback on this?

For now, it's only called once per session .... once a day

I'm even wondering if we couldn't store the data locally? it's super small: 18kb for our biggest miner, which means there is an entry for each day. There are already a lot of calls done when starting a new session, if we could only query the last few days of history it could light a lot the call.

nop33 commented 8 months ago

There have been several discussions within the team, so I will summarise everything here:

@polarker and @tdroxler came to agree with the original idea of this issue: Store already confirmed txs locally.

Additionally, they propose that since everything is derived from transactions data, we can move the computation load from the explorer backend to the wallet client side. Having all transaction history we can then calculate address balances, token balances, historical balances, basically everything that we use the explorer backend atm, without needing to query the backend.

Concerns

Too much data for the client to store: To address this, we could use a pruning technique: Since data in the blockchain are immutable, we can compute the state of the app and persist the computer values in local storage. The extension wallet uses this technique. We could extract core computation functions to the SDK to share them amongst the wallets. The app state could store the latest_tx_hash that it has taken into account for the computed state. On app launch, it could query the explorer backend to say:

"This is the last tx hash I am aware of, give me all new transactions that took place since then".

The app would re-calculate its state and update.
Mobile devices are not very powerful: Consider someone restoring their mining wallet on their Android phone (not advised, but possible). The app will query transactions starting from Genesis and process them. There are high chance that it cannot store all data in RAM before processing them, so a technique for batch processing will need to be implemented.
...?

tdroxler commented 8 months ago

I think stuff like balance could still be called from the explorer-backend, for example for the mobile wallet it's better so we can quickly showing the balance when opening the wallet. The download of the txs could be done in background and slowly show additional information when it's there

nop33 commented 3 months ago

Closing in favor or https://github.com/alephium/alephium-frontend/issues/126

alephium / alephium-frontend

Store already confirmed txs and historical data locally #60

Concerns