MASQ-Project / MASQ-Node-issues

This repo contains the issues that are used for planning MASQ Node product work. It has no code in it, only GitHub issue tickets
https://masq.ai/
31 stars 12 forks source link

Set `BlockchainBridge` free from blocking caused by RPC calls #744

Open bertllll opened 1 year ago

bertllll commented 1 year ago

How we use the methods for RPC calls from the Web3 library is unfortunate. Instead of treating them as how they are intended to, we call a method on them all, without an exception, whose processing squeezes the whole actor. It is a method that, in case of unanticipated events, prevents the Actor from making progress by finishing the currently started message and picking another from the Actor's mailbox.

These methods return Futures, as it is desirable for code like this to run asynchronously, however, we give up on the benefits and don't drain the potential of it out really. Instead, we call .wait() on the created Futures and therefore we deliberately choose to block the local thread and wait up until whatever data comes back from the remote server.

That opens up space for bad events:

If the blockchain service provider is down for any reason, our call won't resolve until their services recover. It can evidently take an indeterminate amount of time. Even though there is an argument that if the blockchain interface becomes unavailable for one call the connection will probably be useless for any kind of blockchain RPC which we use and therefore the fact it's blocking isn't causing any active damage by preventing something what could have been used otherwise. It cannot.

From the optics of the user's experience all actions the BlockchainBridge can perform on an order of the user will look unresponsive. The first call that has stuck and is causing blocking will takes up all the capacities of this actor. The worse, the ongoing handling of this particular actor message is being prolonged and therefore no other message can reach its turn.

The actor has its own Mailbox. If anybody sends a new message to this actor it will languish in its mailbox without a chance to be picked and played. For the BlockchainBridge, it is, I believe, exclusively the Accountant who supplies it with messages (even if the need arises elsewhere, the Accountant is always the last piece of the chain and the producer of that message). Fortunately, these message are always "just" the scan requests, so the global effect is that scanning has stopped functioning and therefore the given Node is nearing a ban from other Nodes.

Another, but less severe observable phenomenon would come from the Websocket messages. This is because the user is given (under the standard circumstances) the ability to provoke the scans by a command. In this situation, though, these executed commands wouldn't bring fruits back. No immediate scanning starts. All it does is another actor message to ask for a scan put in the BlockchainBridge Mailbox. We also can see that the user's order would necessarily time out soon, because no response turned up, and information about that would be displayed.

We need to get away from this disadvantageous situation. For example, we have a record of a card that aims at making it possible to change the blockchain interface service provider during runtime #738. If feasible, we'd appreciate that functionality in our situation. If it allows to switch from a provider that is out of service to another which works fine. In order to be able to that, we'd definitely have to function without the blocking, which means we would employ true asynchronous mechanism and let the Actor have free hands for proper maintenance of the queue with waiting messages. This might be a bit complicated but not terribly.

Anyway, we believe it's advisable to put this task off and focus on the cards that will make our codebase understand the new async/await concept. (A good start is this one #676) We could otherwise easily wast time on reworking the same area of code twice, now, with the out-fashioned chain-like Futures, and later when making async/await code out of it.

utkarshg6 commented 6 months ago

@Syther007 and I were working on this card for some time and have figured out some issues with the current design.

  1. We're not fetching the gas_price and nonce by performing RPC calls, instead we're relying on storing/calculating these values locally.

  2. In case 5 transactions were supposed to be sent, and if the third one fails, then the current code will not attempt the next two transactions. Although, in the next scan cycle those failed (1 + 2) will be picked but it doesn't follow the ideal approach.

Programming Design Issues:

  1. Unable to return futures properly: The current Actor Model system wasn't capable of returning futures because web3 calls were done using methods that involved a call to self. A solution was to remove all these self calls. We worked on removing these calls by refactoring these methods into pure functions, so that all the nested functions can return futures properly.

  2. Improper testing with Mocks: The Mocks were used to test these methods which uses web3 library using a Transport. The tests were not good enough to test the code properly by actually making the web3 calls. So, we worked on using a real TestServer, and made real HTTP Requests/Responses to test these functions.

  3. Issues with TestServer: While writing the tests, we used a test utility called TestServer. This utility had concurrency issues which could be easily detected on the Mac ecosystem and partially detected in the Linux. It turns out that this utility could not be used for writing tests and instead we should use MockBlockchainClientServer which is a part of the multimode tests. @Syther007 worked on migration of its code from the Multinode Test's utility to the masq_lib, which was a challenge in itself due to the utility's dependency on the docker.

  4. Obsolete versions of Futures and web3: The functionality of these two libraries are limited at the moment. We do need to upgrade them to write the code with the latest features. Also, the way futures are used in the latest version are very different than the way we're using right now using the old versions of the library.