HathorNetwork / hathor-wallet-headless

Headless Hathor Wallet
https://hathor.network/
MIT License
16 stars 23 forks source link

[Design] Improve integration tests performance #173

Open tuliomir opened 2 years ago

tuliomir commented 2 years ago

Summary

This issue describes performance improvements that can be made on the tests for the Wallet Headless, mainly the integration tests.

Motivation

Currently the integration tests take a long time to run and the tests are heavily dependent on each other and on the sequence they are run, so they can only be executed using the --runInBand Jest mode. This increases the complexity to develop and read/understand them.

With improvements on performance, it would be possible to make all tests fully independent, each creating their own wallets to make their transactions and assertions.

In case these improvements are not enough to make every test independent, at least a partial wallet cleanup should be possible, in a pattern that was drafted on the UTXO-Consolidation integration tests.

In this pattern, every test moves funds to a "workbench" wallet, makes its transactions, validates them and then moves the fund away from the workbench, leaving it empty for the next test.

Guide-level explanation

We have identified that the most time-consuming processes on the integration tests are:

1) Wallet initialization 1) Common wallets take about 7-10 seconds each 2) Multisig wallets take up to a minute 2) Transaction websocket update 1) Wallet lib cache takes about a second to update after a successful transaction 2) A manual delay is made on the tests to accomodate for this

We expect that by improving these points, we can reduce at least 20% of the current 1.100 seconds it currently takes the tests to complete and lower the barrier to creating more, independent, test transactions and wallets. More data on current times on a section below.

So, the scope of this issue will be restricted to only those two points. This means the tests will continue to be run serially and more improvements will have to be made, but at least the run time will be more manageable for the developer.

Reference-level Explanation

Wallet initialization

The initialization process is comprised of the following steps, in chronological order: 1) Validating the wallet data received via HTTP request 2) Calculating and encrypting the private keys 3) Opening the event handlers for 1) Connection state change 2) Web socket updates 4) The wallet-lib then calculates each wallet address up to the gap limit 5) A query is made to the fullnode for the transaction history of each address 6) A call is made to the fullnode requesting the API version

The test helper then has to start pooling the headless application, waiting for the wallet to become Ready.

The most time and CPU consuming of them is step 4, where the addresses are calculated. Below are some solutions and proposals for improving these points:

❕ A note about the cpu consumption of step 4: when the wallet address generation code is executed normally or via script, it takes under 500ms. When this same code executes under jest's environment in test conditions, it jumps up to 7 seconds for common wallets and 60 seconds for multisig wallets.

The reasons for that are still unknown, but may have relation to bitcore calculation optimizations that are unavailable under test environments.

Address pre-calculation

A script can be made to generate hundreds of wallet words and pre-calculate their associated addresses. The route /start can receive a new parameter preCalculatedAddresses containing an array of addresses in string format.

Changes would also need to be made on the wallet lib to prioritize these pre-calculated data, on files:

On file src/new/wallet.js, in the constructor:

// Parameters object, line 91
preCalculatedAddresses = null

// Before dealing with the store, line 142
if (preCalculatedAddresses) {
  this.preCalculatedAddresses = preCalculatedAddresses;
}

On file src/wallet.js, function loadAddressHistory

loadAddressHistory(
  startIndex,
  count,
  connection = null,
  store = null,
  preCalculatedAddresses = null // New parameter, line ~510
)
{
  // ...
  // Around line 520
  const address = preCalculatedAddresses && preCalculatedAddresses[i]
    ? preCalculatedAddresses[i]
    : this.generateAddress(i);
}

On the headless test helpers, a proof-of-concept code can be found on commit 0c81d4c.

Address calculation script

To pre-calculate this data, a script should be used to: 1) Generate a few hundred JSON objects containing: 1) Wallet words 2) About 22 addresses for these words ( maybe more for the genesis wallet ) 2) These JSON objects should be inserted on a file.

A proof of concept can be found on #189 , on file generate_wallets.js

At test runtime, a helper module should: 1) Open this JSON file and identify which wallets were already used 2) Initialize unused ones using the preCalculatedAddresses parameter above

❕ Note: The wallet pre-calculation feature was implemented on the Headless via PR 189 and the lib's feature was implemented on PR 354

Fullnode response caches

A new TEST_MODE configuration parameter should be added on the config.js file. When this flag is active, most communications with the fullnode could be cached to reduce the amount of communication delays.

These delays are usually minimal under common conditions, but when a single cluster is having to act as a miner while handling the application requests, times can go up considerably. This adds up when an automated test suite executing various operations in short sequence. There were cases when a simple call to the fullnode's /v1a/version could trigger timeouts.

Examples of response caches from the fullnode:

Version check

This call to /v1a/version could be done only once per application launch and cached for all other wallet initializations.

Address tx history

The call to thin_wallet/address_history on the loadAddressHistory method could be skipped most of the times, since the wallet never contains any transaction (except for the genesis and miner wallets).

This would need to be an explicit flag passed to each wallet start request, but that would only be available when on TEST_MODE.

Transaction handling

Currently the dockerized miner image takes about 0.5 seconds on average to mine each transaction for the tests. After that, the test code has to wait a while for the lib store to be updated via websocket.

Improving the cache wait

There are two main ideas on how to improve the time waiting for the cache update from the fullnode via websocket: implementing an event listener or forcing the cache update.

At the moment, we have set an arbitrary time of 1 second before proceeding with the tests after each successful transaction. Both the improvement ideas are elaborated below.

Event listener

By implementing a test helper that directly listens to the wallet-lib updates, we remove the need to set arbitrary time delays on test code. This can lead to better test times and improved consistency.

This can take the form of a simple EventListener listening for the wallet-lib's existing events new-tx and update-tx. If these events are not available for listeners outside the scope of the lib, an adaptation could be made to allow a test helper to listen to its events.

A note on this approach is that this code should not add too much complexity to the update-waiting process, reducing the chance of creating bugs related to test conditions.

Cache force update

Another possibility is to force-update the cache everytime the lib receives a success callback from the tx-mining-service request.

This would lead to faster transaction handling times, since the websocket would not be necessary. However, inconsistencies could appear should the websocket update from the fullnode diverge from the local cache.

Some further study on how to treat these situations could be necessary.

Reduced mining weight

An alternative approach could be used to mine the transactions (and blocks): instead of firing a cpuminer, a simplified miner could be instantiated ( maybe on a fullnode or tx-mining-service, or even a modified cpuminer ).

This modified miner should receive transactions with only 1 weight and mine blocks with also 1 weight, sleeping between blocks to avoid the blockchain's weight self-regulation mechanism.

This could be called single_mining_mode, to clarify that it would only happen on single-miner blockchains for test purposes.

Experiment with external mining

On a preliminary study, an experiment was made with the miner separated from the fullnode + tx-mining-service docker images. The result was that the transaction mining times went up ten-fold, timing out multiple times. The test times went up considerably.

So, the idea of having an external miner to avoid CPU concurrency is not being considered at the moment.

Task Breakdown

Task Dev days
Address pre-calculating script 1
New preCalculatedAddresses parameter on wallet-lib 1
Cache fullnode version response 0,3
Cache empty tx-history response 0,5
Implement proof-of-concept for cache wait improvement 2
Research an alternative mining process 2
Total 6,8 dev days

The tasks above do not need to be implemented all at once, and may be divided into multiple issues/PR's to keep improving the integration tests gradually. It would be natural that for each of these steps, other ideas could appear and become better ways to advance.

Measurements

Current State

We have, as of v0.12.0 the following numbers on the integration tests:

On these tests,

The common wallets take between 8 and 10 seconds each to start. Multisig wallets take between 55 and 60 seconds each.

The transactions take between 0.07 and 3 seconds to mine on tx-mining-service, with an average of ~0.5s each.

From the application's perspective, the transaction requests take from 0.7s to 3.5s, with an average of 1.4 seconds each.

After that, the precise time when the wallet-lib cache is updated is still unknown, but a delay of 1 second is consistently being enough to ensure it is completed before executing other instructions on the tests.

How to measure

This data was obtained from a variety of sources:

At the moment this is a manual process, but could be automated to provide faster feedback to changes. The steps to retrieve these informations are the following:

Number of test suites, tests and full run time

Get this information directly from Jest's stdout or investigate a way to output this information to a file on jest framework.

Amount of wallets, wallet time statistics

On the ./tmp/wallet-benchmark.log each line represents a wallet instance and contain a parseable JSON object.

Filter only for the objects that contain the wallet property and count its instances for the amount of wallets started. Note that some of those will be repeated initializations of the genesis or miner wallets on multiple suites.

For the initialization times, use the wallet.walletReadyDuration property on each instance for calculating min, max and average times. This is the amount of time spent between requesting its start and receiving its status as ready.

Transactions data

On the ./tmp/tx-benchmark.log each line also represents a transaction instance and contain a parseable JSON object.

Filter only for the objects that contain the tx property for getting the amount of transactions executed on the test run.

For the amount of time spent waiting for the transaction completion since its request on the application perspective, check the tx.duration property on each object. Note that this does not include the amount of time waiting for the cache update via websocket.

A relevant information about the time spent mining each transaction will be available on the docker container logs for the tx-mining-service. The log lines that contain this time will be starting with:

txstratum.manager - INFO - AAAA-MM-DD HH:mm.ss [info     ] TxJob solved

On each of these lines there will be a JSON-like stringified object containint a property total_time that contains the time spent mining this transaction.

Note on benchmark files

The wallet-benchmark and tx-benchmark logs are appended but never cleaned by the test helpers. Since each test suite has no shared memory with the others, the solution found to calculate a sum of events between all of them was to use the filesystem persistence.

On GitHub's CI this causes no issues, since the container responsible for running the application is destroyed as soon as the tests finish. But on developer machines the ./tmp folder may start accumulating leftover data from previous test runs, causing issues.

So, it's good practice to, when executing the full integration tests, clean up the ./tmp folder and destroy the previous fullnode with npm run test_network_down.

External Miner

On an experiment running an external miner connected to a private network's _tx-miningservice running on localhost, the transaction mining time was consistently around 5 seconds, with many instances of timeouts.

The resulting running time was upwards of 1200 seconds.

Precalculated Wallets

On a first experiment with pre-calculated wallet addresses, the common wallets took an average of 2.3 seconds to start, while the multisig took 4.6 seconds. Both with little deviation from this average.

With this change alone, runs are being executed in around 500 seconds, a gain of near 50%.

On making tests more independent

The way to make the tests independent and that would be the easiest to read and maintain would be to initialize one or more wallets for each test independently.

With the current number of 145 tests, many of them requiring two wallets, a number of 200 wallet initializations would be reasonable. Each one of them would need at least one transaction to set its conditions, but many tests require multiple UTXO's. Assuming 3 transactions per test, we would have around 430 transactions.

Considering the wallet precalculation and non-optimized transactions times, this would lead to:

Further improvements could be done to the transaction times which could reduce the times to 1.5s per tx, or around 600s. Considering the same test run processing time of 140 seconds, this would lead to about 1600s ( 27min ) of test times.

Even though the times would be considerably higher than what is possible now, the readability gains and the tests independence could be worth it. Especially considering most tests would be made locally by the developer on a subset of suites related to the changes being made.


Reviewers: @pedroferreira1 , @andreabadesso

pedroferreira1 commented 2 years ago

Before changing the lib, I feel we should add a new parameter to the Config class, which is TEST_MODE.

I think the solution for "Address pre-calculation" and may be useful in other situations but the "Fullnode response caches" must be done only if TEST_MODE=true. Apart from that, when TEST_MODE=true, then we should always return weight=1 for transactions.

Helper event listener

I understand the problem you are trying to solve here but I don't understand the proposed solution. We already have a websocket connection that receives this new transaction and updates the local store. Why do you need to create a new connection? I think a solution might be to manually set in the store the updated transaction after the full node confirms the push, rather than waiting for a ws message but I'm not seeing how creating a new ws conn would help.

Besides that, I feel we should try to do this and, in case we feel the mining is still a problem, we could try to mine transactions without a cpuminer, given that the tx weight will be 1, any python code can easily mine the tx.

Would be also interesting to have some data from the tests you've run before. The data we have now is:

Common wallets take about 7-10 seconds each We expect that by improving these points, we can reduce at least 20% of the current 1.100 seconds it currently takes the tests to complete and lower the barrier to creating more, independent, test transactions and wallets.

How many wallets do we start? How many would you need to start if we decide to make tests independent?

tuliomir commented 2 years ago

Added information about this TEST_MODE.

The part that confused me was only the transaction weight: at our docker-compose.yml file we already instantiate the fullnode with the --test-mode-tx-weight flag. Wasn't this supposed to make all transactions happen at 1 weight? At the moment, all of them are being mined at weight=8.000001 on the tx-mining-service docker logs.

About opening a new websocket connection, I agree that is not the best solution. I kept the event listener and added this idea of manually setting the store data.

I called it "force update cache", since it will represent an alternative input to the cache data. Future developers will have an easier time understanding why their data may be altered without websocket responses.

Also added a new section "Measurements" with the collected data I found relevant and some speculation about wallet numbers on test-independent scenarios.

pedroferreira1 commented 2 years ago

The part that confused me was only the transaction weight: at our docker-compose.yml file we already instantiate the fullnode with the --test-mode-tx-weight flag. Wasn't this supposed to make all transactions happen at 1 weight? At the moment, all of them are being mined at weight=8.000001 on the tx-mining-service docker logs.

With this test mode you are saying to the full node to accept transactions with weight at least 1. However, who decides the weight of the tx to be mined is the js library with our headless wallet, and the library is using the default min weight for the testnet, which is 8. That's why having a TEST_MODE in the library, then you can create transactions with weight 1.

pedroferreira1 commented 2 years ago

These JSON objects should be inserted on a TXT file, one wallet per line ( ❓️any other extension suggestion? )

Why don't we use a json file?

A new TEST_MODE configuration parameter should be added on the config.cfg file.

It's config.ts file.

The call to thin_wallet/address_history on the loadAddressHistory method could be skipped most of the times, since the wallet never contains any transaction (except for the genesis and miner wallets).

I like this but I think we should explicitly say that the wallet being loaded is empty, not just assume that with TEST_MODE.

Current State We have, as of v0.12.0 the following numbers on the integration tests:

13 test suites 145 tests Around 1000-1100 seconds for each run On these tests,

57 wallets 95 transactions The common wallets take between 8 and 10 seconds each to start. Multisig wallets take between 55 and 60 seconds each.

The transactions take between 0.07 and 3 seconds to mine on tx-mining-service, with an average of ~0.5s each.

From the application's perspective, the transaction requests take from 0.7s to 3.5s, with an average of 1.4 seconds each.

After that, the precise time when the wallet-lib cache is updated is still unknown, but a delay of 1 second is consistently being enough to ensure it is completed before executing other instructions on the tests.

Did you run those tests in your local machine, or in the CI?

And how did you run it to get the statistics? Would be good to have the step by step on how to get each of those metrics, so anyone can run it and compare. Even after the improvements are made, it's important to use the same script and get the same metrics from it.

So would be good to add the script here and how did you get each of the time metrics.

tuliomir commented 2 years ago

Added a new section on how to get these metrics and how to improve them. Also fixed some other typos and clarified other sections.

pedroferreira1 commented 2 years ago

andreabadesso commented 2 years ago

The design looks great to me

I would just make it clearer on the Motivation section that this design won't solve the problem of not being able to run multiple tests in parallel