[Design] List address balances per token on the Explorer

andreabadesso commented 2 years ago

Feature Name: List address balances per token on the Explorer
Start Date: 2022-16-05
RFC PR: (leave this empty)
Hathor Issue: (leave this empty)
Author: André Abadesso andre@hathor.network

Problem and Opportunity

We should be able to display the total number of addresses created and list those addresses with the ability to sort them by holdings. It should be possible to see the richest addresses for both HTR and custom tokens.

This design has been heavily inspired by the Token API design

Solution

We will leverage the Wallet Service's already indexed database to populate the Explorer Service with address and token holdings information, ingesting it on ElasticSearch through LogStash

High-Level Design

address_list (2)

Steps 1 through 3 describe the synchronization process. Every minute, Logstash will get changes from the Wallet Service address_balance table (by using the updated_at column) and send them to ElasticSearch.

Steps 4 through 6 describe how the client (Explorer) will communicate and how this request will be passed to the ElasticSearch.

Wallet Service

On Wallet Service, two new columns will be added to the address_balance table: created_at and updated_at. On migration file, all current address_balance rows will have the current datetime as initial values for both columns. This is necessary for Logstash to know which records to process.

Task Breakdown

Task	Effort (dev-days)
Create migration file on Sequelize to include two new columns	0.2
Total	0.2

Costs

No cost will be added to the Wallet Service

Hathor Explorer Service (Backend)

Hathor Explorer Service will be the place of most of the backend changes. We will re-use the Logstash and ElasticSearch instances that were created for the Token API

Logstash (Already Exists)
ElasticSearch (Already Exists)
Lambda Function - AddressBalanceList Handler (Default lambda timeout of 6 seconds)
New endpoint on API Gateway - GET /address_balance

Note: We will have three indexes on the same ElasticSearch cluster, one for mainnet, one for dev-testnet and another one for the testnet

API (Hathor Explorer Service - API Gateway):

GET /address_balance?from=:from&token_id=:token_id&sort=:sort

From = Start the records from the nth record. TokenId = Filter by token_id Sort = Since we will only allow sorting by value for now (until we add the total supply percentage), only available parameters are ASC or DESC.

GET /tokens

The frontend will use this API to allow the user to search for the token he wants to see information about. It has been described on the Token API Design

Throttling

We should use the same defaults that were set for the Tokens API lambda:

  apiGatewayThrottling:
    maxRequestsPerSecond: 50
    maxConcurrentRequests: 25

Tasks breakdown

Task	Effort (dev-days)
Create AddressBalanceList Handler	2
Define the mapping of the indexes on the ElasticSearch instance	0.5
Configure and test the ElasticSearch and Logstash service	1
Update Explorer Service documentation	0.5
Configure throttling and caching	0.5
Total	4.5

Costs

ElasticSearch

We are currently using only 776MB out of the 30GB available on the current ElasticSearch cluster created for the Token API, since we have monitoring set up for the storage, I suggest we keep the cluster as-is until the data gets large enough to require a cluster upgrade.

Lambda

Considering that the Lambda function will handle 100,000 requests, we will have a total cost of 0.11USD/month.

API Gateway

API Gateway charges $3.70 per million requests. Considering 100,00 requests/month, we will have an estimated total cost of 0.37 USD/month

Logstash

We will also use the same instance created for the Token API, so we should also have no additional costs as the instance is enough to handle both use cases

Hathor Explorer (Frontend)

A new menu item, called "Tokens", will be introduced in the navigation bar. Once the user clicks on it, he can select either the token list or the token balances. After clicking the Tokens Balances menu item, a table of addresses and their balances will be rendered (Initially sorted by balance desc). The default token for this screen is the Hathor Token.
The user can search, by using an auto-complete input form, for different tokens and click on a single option to render its information
After the token is selected from the auto-complete list, the table will be rendered and token information will be displayed
The user can sort the address balances by value
By clicking on any address from the table, we will redirect the user to the address detail screen with the selected token as a filter, here is an example

Screen Shot 2022-05-23 at 14 03 59

On this screenshot, the token is already selected from the search auto-complete

168644595-2436d387-f1ce-4604-b4bf-3d58d4c244a3

On this screenshot, we display the options returned from ElasticSearch, highlighting the searched term

Disclaimer: This is just a mock-up. The final result might not be exactly like this.

Tasks

Task	Effort (dev-days)
Create config for the feature toggle	0.2
Create the UI (Menu link, table, components for pagination, autocomplete)	1.5
Retrieve data from Explorer Service	0.5
Make pagination and sorting	0.5
Total	2.8

Costs

No additional cost will be added to the Explorer, as it already runs on an S3 as a static website.

Tasks Consolidation

Service	Effort (dev-days)
Hathor Wallet Service	0.2
Hathor Explorer Service	4.5
Hathor Explorer	2.8
Total	7.5

Cost Consolidation

Service	Resource	Estimated monthly price
Hathor Explorer Service	AWS API Gateway	0.74 USD (2 stages)
Hathor Explorer Service	AWS Lambda	0.22 USD (2 stages)
Total	-	0.96 USD

Infrastructure

The new infrastructure code will be done in the ops-tools project, on the same deploy structure that already exists for the Tokens API feature

Future Possibilities

Total Supply percentage

Some explorers show on the address_balance table the percentage of the total supply each address has

We currently have a total_supply API on the wallet-service that is being used by the network-monitor, but it is a very expensive call as it will fetch this data from the tx_output table on every request.

We could start storing the updated total supply on a separate table on every transaction the wallet-service receives, so this data can be fetched by the explorer (or ingested through ElasticSearch) to be able to display the percentage of the total supply for a token each address has

Share-able search page

On the new Tokens list page, we allow the user to share his search on a specific page, we could also add this feature to the address list page

lucas3003 commented 2 years ago

Hi Abadesso, my comments:

High Level design

We don’t need to have an external cronjob. In logstash JDBC input plugin we can define the recurrence

Hathor Explorer Service (Backend)

We should include a task for modeling our data in ElasticSearch (Define the mapping of this new index)
We should mention that we will have two indexes (one for testnet and other for mainnet) in the same cluster (As we do today for Tokens)
GET /address_balance?from=:from&token_id=:token_id&sort=:sort
- On Tokens API, we use the search_after param to go to next page. Will the from argument be translated to search_after on ElasticSearch, in this case?
- You mention that Sort = Which field to sort and if ASC or DESC.. How is this structure going to be? Like token:asc?

Hathor Explorer

On Tokens API, we do have a feature to make a shareable link. For example, the user can search for YanToken, sort by id, and then share this link. The person who access the link will have the same search setup. Will we have the same feature here?
Just a comment to let you know that going backward on Elasticsearch is a bit tricky. ElasticSearch does not provide a search_back param, so we store the search_after param for each page on the UI, and then when user wants to go backward, we use the search_after we stored for the previous page and pass that as a param to ElasticSearch. Not sure if this explanation was clear. But feel free to contact me on this issue.

andreabadesso commented 2 years ago

@lucas3003

Thanks for your review

We don’t need to have an external cronjob. In logstash JDBC input plugin we can define the recurrence

Updated the diagram, adding the cronjob inside the Logstash instance

We should include a task for modeling our data in ElasticSearch (Define the mapping of this new index)

Added it on the tasks breakdown

We should mention that we will have two indexes (one for testnet and other for mainnet) in the same cluster (As we do today for Tokens)

Updated the design, just worth noticing that I'm not creating an index for the dev-testnet environment

On Tokens API, we use the search_after param to go to next page. Will the from argument be translated to search_after on ElasticSearch, in this case?

Yes, I think we can do that on the Lambda, just like the tokens API, right?

You mention that Sort = Which field to sort and if ASC or DESC.. How is this structure going to be? Like token:asc?

Actually, we are currently only allowing the user to sort by value, so I updated the design to consider it

On Tokens API, we do have a feature to make a shareable link. For example, the user can search for YanToken, sort by id, and then share this link. The person who access the link will have the same search setup. Will we have the same feature here?

I've added this as a future possibility, I don't know how valuable this would be for now. I will talk with @trondbjoroy and @pedroferreira1 about this

luislhl commented 2 years ago

Steps 1 through 3 describe the synchronization process

The diagram has no step 3. Additionally, I think you could add the name of each service below their image. These AWS images are not so straightforward.

We will not have an index for the dev-testnet environment for now

Wouldn't it be useful for development to have this index? Unless it adds too much work to have it, but I think that's not the case.

Configure throttling and caching

What throttling parameters are we going to use?

Lambda Function - AddressBalanceList Handler (Timeout of 30 seconds)

Is there some reason why we are giving it a high timeout?

Create config for the feature toggle

We will use this feature toggle just to turn on and off the feature, or will we do things like partial rollouts as well?

The new infrastructure code will be done in terraform

It seems to me that there is nothing to create on Terraform, since the Lambda and API Gateway will probably be just added to the explorer-service's serverless, and the other things already exist.

andreabadesso commented 2 years ago

@luislhl

The diagram has no step 3. Additionally, I think you could add the name of each service below their image. These AWS images are not so straightforward.

Updated the diagram

Wouldn't it be useful for development to have this index? Unless it adds too much work to have it, but I think that's not the case.

I guess you're right, I didn't add it because we also don't have it on the tokens API. Updated the design removing this notice

What throttling parameters are we going to use?

I was thinking on setting the same defaults as we have on the wallet-service lambdas and tune it after we learn from usage on the explorer

maxRequestsPerSecond: 500
maxConcurrentRequests: 250

Is there some reason why we are giving it a high timeout?

Not really, I guess we can use the default 6s. Updated the design

We will use this feature toggle just to turn on and off the feature, or will we do things like partial rollouts as well?

For now, those are the features I've created, and they all have the same default strategy (on/off)

It seems to me that there is nothing to create on Terraform, since the Lambda and API Gateway will probably be just added to the explorer-service's serverless, and the other things already exist.

You are right, removed the part about terraform and updated the infrastructure part

luislhl commented 2 years ago

maxRequestsPerSecond: 500

This seems like a lot. Although we know the Lambda will be able to handle this, the same request rate would potentially hit our Elasticsearch, right?

I'm slightly concerned that someone trying to run a DDoS could be successful in taking the Elasticsearch down.

Did @lucas3003 do something about this in the Tokens API? Maybe we could apply similar policies.

But at a mininum I would check if we can grab any info about which rate Elasticsearch is capable of hanlding, and decrease this throttle if necessary. We will hardly reach this level of usage under normal circumstances, at least not in the near future.

What do you think?

lucas3003 commented 2 years ago

Did @lucas3003 do something about this in the Tokens API? Maybe we could apply similar policies.

I applied rate limits on the Explorer Service lambda, but not on the ElasticSearch cluster itself. More info on this PR

andreabadesso commented 2 years ago

@luislhl, @lucas3003

All requests to the Lambda API will hit our elastic search as it is also handling cache

I guess we can set the same limits we have on the tokens API for our address balances Lambda and open an issue to check if our current ElasticSearch instance can handle it and tune it accordingly

What do you guys think?

luislhl commented 2 years ago

I guess we can set the same limits we have on the tokens API for our address balances Lambda and open an issue to check if our current ElasticSearch instance can handle it and tune it accordingly

I think that's good enough for now. So we would have 50 req/s as limit, right?

For me this is :heavy_check_mark:

lucas3003 commented 2 years ago

I also believe this is good enough. Approved for me.

HathorNetwork / hathor-explorer-service