aurora-is-near / rainbow-bridge-client

Monorepo containing Aurora-maintained libraries for using the Rainbow Bridge
https://github.com/near/rainbow-bridge-client/tree/main/packages/client#readme
MIT License
25 stars 7 forks source link

Reduce quantity of data queries to Ethereum/NEAR nodes. #54

Closed paouvrard closed 3 years ago

paouvrard commented 3 years ago

The status of bridge transfers is currently monitored via this checkStatusAll loop which will check the transaction and sync status for every transfer with an in-progress status: https://github.com/aurora-is-near/rainbow-bridge-client/blob/152b0755d550998adf3533e1dc1f4c0e5f3892c2/packages/client/src/index.ts#L178

The loop interval is currently fixed by the @near-eth/client user (the frontend) at around 5 secs. This works well for checking the status of a pending transaction being mined. But when checking the sync status of a multi-hour transfer, checking so frequently is a waste of resources (node, Infura...)

Possible solutions:

  1. Keep the same checkStatusAll loop but timestamp the last checkSync done on the transfer object and skip future calls until a reasonable interval is reached: https://github.com/aurora-is-near/rainbow-bridge-client/blob/152b0755d550998adf3533e1dc1f4c0e5f3892c2/packages/nep141-erc20/src/natural-erc20/sendToNear/index.js#L125 a) Couple minutes interval and same interval for each connector ? But Ethereum -> NEAR transfers are faster and show a more precise status with the number of confirmations. (x/30)

    b) Optionally defined at transfer creation ? Each transfer can define it's own checkSync interval at transfer creation (with a reasonable default value adapted to the transfer direction so that if doesn't need to be specified by the user of the library)

  2. Introduce a new mining status to differentiate checkTransactionStatus from checkSync and run 2 loops separately at different intervals. One loop checking the status of pending transactions while the other checking the sync status of transfers.

I'm currently leaning towards solution 1.b) which gives more freedom to connector libraries and keeps the @near-eth/client more general with the same ConnectorLib interface https://github.com/aurora-is-near/rainbow-bridge-client/blob/152b0755d550998adf3533e1dc1f4c0e5f3892c2/packages/client/src/types.ts#L53

cc @mfornet

paouvrard commented 3 years ago

Currently during checkSync (called every 5 secs) of Ethereum -> Aurora transfers, findEthProof is called to check whether a transfer was finalized by the event relayer. So a proof is built every 5 secs and building a proof requires querying all the transaction receipts in a block individually: https://github.com/aurora-is-near/rainbow-bridge-client/blob/39ae822e0d72da8a66a68cce7c65afdc044b759b/packages/utils/src/findProof.ts#L99 This would explain the excess number of eth_getTransactionReceipt queries observed on infura. For example if the event relayer is delayed by 1 hour, with 150 tx in the deposit block, that's 150 12 60 = 108k eth_getTransactionReceipt calls for checking a single user's transfer during 1 hour while the growth plan of infura is 5M calls per day for all users.

paouvrard commented 3 years ago

This is less of an issue on bridgetonear.org because checkSync will stop running if the event relayer didn't finalize within 10 blocks and fallback to the user's NEAR wallet to finalize the transfer. But on aurora.dev checkSync will loop forever until the transfer is finalized by the event relayer.

mfornet commented 3 years ago

I like the idea about doing one transfer, then estimating the amount of time until the next expected change and repeat.

This will work fine for multi-hour transaction, and for fast transactions.

For transactions that depends on the event relayer finalizing them, we can do the same, since relayer is running in a loop, and it should be picking new tx as they appear. However it is possible for several reasons that relayer fails. To avoid oversaturating our endpoints we can exponentially increase the time we wait between calls up to a reasonable, but large amount. Start with 1 second and increase up to 10 minutes?

sept-en commented 3 years ago

I'm also towards 1.b solution! I know that the amount is crazy as I was able to reach those limits quite easily in Alchemy, for example, during some hard testing of the bridge.

I also like the idea that @mfornet mentioned, this is a quite popular solution to reduce the number of redundant calls. I.e. we estimate that TX will take up to 10 minutes to complete, then we multiply that time by some factor (e.g. 1.5 * 10 minutes = 15 minutes) and during this time we expect to have the constant time of checking, e.g. 30 seconds. When this time ends up but the TX is not yet finalized, we increase the interval twice. And then this interval increases up to some limit, e.g. 10 minutes.