Problem: gravity-bridge orchestrators don't reject unprofitable messages from cronos to ethereum

yihuang commented 3 years ago

Currently, the orchestrator relays all messages blindly, vulnerable to spamming, since the gas fee on the ethereum mainnet is so high. With the current design of the gravity bridge, users can pay a bridge fee to orchestrators with the transferred erc20 tokens. Ideally, the orchestrator needs to:

estimate the gas to be consumed on ethereum
get current ethereum gas price from oracle
get the exchange rate from token to eth from some exchange API
check bridge_fee * exchange_rate > estimatedGas * gasPrice
after received the bridge fee tokens, convert them to eth asap when the price hasn't changed much, could be handled externally.

Difficulties:

How to estimate the gas on ethereum? maybe need to estimate offline, and put the numbers in the config file, could be different for each token, since the erc20 implementations could be different?
fault tolerance, should orchestrator keep working when price data is out of date and the price oracles down?

tomtau commented 3 years ago

can one at least rate-limit? e.g. relay 1 batch per hour

calvinaco commented 3 years ago

One requirement / discussion to add is a timeout for the batch. Say if a batch has been unchanged for say n hours, the relayer should still submit the batch.

yihuang commented 3 years ago

One requirement / discussion to add is a timeout for the batch. Say if a batch has been unchanged for say n hours, the relayer should still submit the batch.

The batch itself has a timeout, after batch expired, txs get back to the unbatched pool, then user have the chance to cancel the tx, and re-submit it with a bigger bridge fee.

thomas-nguy commented 3 years ago

after received the bridge fee tokens, convert them to eth asap when the price hasn't changed much, could be handled externally.

note that it is good to have but not mandatory in the minimum acceptance crtiteria considering short timeline.

I think what calvin mentioned by timeout is the maximum time transaction needs to "wait" before being send to ethereum even if they specify low value or zero fee (in that case orchestrator will cover the cost, but user needs to wait relatively long time) ie rate-limit from tomtau

yihuang commented 3 years ago

I think what calvin mentioned by timeout is the maximum time transaction needs to "wait" before being send to ethereum even if they specify low value or zero fee (in that case orchestrator will cover the cost, but user needs to wait relatively long time) ie rate-limit from tomtau

when batch timeout, it'll create a new batch later with a new batch timeout.
when there's a set of higher valued txs, it'll replace the current pending batch, and reset the batch timeout

I'm not sure how can we add our timeout logic in this.

thomas-nguy commented 3 years ago

timeout is not the correct term.

Basically we want to send the batch always every X hours whatever if it is profitable or not.

In genesis config we can setup batch timeout > rate-limit

Pseudo code will be

// Given BatchGasEstimate a config/env parameter

ethFeeValue = GetEthGasPrice() * BatchGasEstimate 

for (token_type, batch) in possible_batches {

batchValue = batch.totalFee * GetTokenPrice(token_type)

if (batchValue >= ethFeeValue) {
  send_batch()
  continue;
} else if ( current_time() > next_send_batch_time ) {
  send_batch()
} 
// otherwise do nothing
}

if ( current_time() > next_send_batch_time ) {
 next_send_batch_time += batch_send_rate
}

https://github.com/PeggyJV/gravity-bridge/blob/main/orchestrator/relayer/src/batch_relaying.rs#L133

A batch is build every 10 blocks in Cronos, contains max 100 txs and have a timeout that can be set in genesis config.

if someone send a very high fee tx, the batch will be cost efficient with only one tx => it will be send immediately
if someone send a low fee tx, the batch need to gather more tx before being send OR relayer needs to reach next_send_batch_time
Worse case, there are too many low fee tx (more than 100) then he will have to wait batch_send_rate multiple time

yihuang commented 3 years ago

if ( current_time() > next_send_batch_time ) {
next_send_batch_time += batch_send_rate
}

@thomas-nguy Do you mean next_send_batch_time = current_time() + batch_send_rate?

thomas-nguy commented 3 years ago

next_send_batch_time += batch_send_rate if we want to strictly send X batch within a period next_send_batch_time = current_time() + batch_send_rate if we want to have strictly X time elapsed within two batches

I think both are fine

devashishdxt commented 3 years ago

Here's the current logic in orchestrator to make the batches more profitable. Once a MsgSendToEth transaction is out of the pool and in a batch one of three things must happen to the funds.

The batch executes on Ethereum, the transfer is complete, locked tokens are burned on Cosmos.
The batch times out, once it's timeout height in blocks is reached, the locked tokens are then returned to the pool.
A later batch is executed (which is guaranteed to have higher fee), invalidating this one and making it impossible to submit. The locked tokens are then returned to the pool.

What changes to the current logic do we want to do as part of this issue?

One thing that I understand from the conversation is to not send a batch to ethereum if it is unprofitable. If we choose not to send a batch and continue sending future batches with higher nonce, all the batches with lower nonce will automatically get invalidated and all the transactions in batch will return to the pool (which may appear in future batches).

Other thing is to add a timeout for the batch but instead of invalidating the batch on timeout, send the batch to ethereum (even if it is unprofitable). I'm not sure how this can be achieved with current design. Any ideas @yihuang @tomtau @thomas-nguy?

thomas-nguy commented 3 years ago

@devashishdxt currently the relayer will always submit the batch whatever the cost is

https://github.com/PeggyJV/gravity-bridge/blob/main/orchestrator/relayer/src/batch_relaying.rs#L133

We want to add the pricing logic at this level.

If it is not profitable, dont call send_eth_transaction_batch except if we have reach the next_send_batch_time

Also to give more details on your explanation. A batch per token type is created every 10 blocks on cronos (x/gravity), the old one is replaced by a new one which is always more profitable (because it will contain the latest most profitable txs). At any point of time, only one batch per token exists.

If relayer decide not to send the batch, then he will need to wait 10 blocks until a new batch is constructed to see if it is more profitable to be send (unless we reach next_send_batch_time)

devashishdxt commented 3 years ago

@thomas-nguy

So we need to maintain next_send_batch_time for each token type on relayer? Or create some additional logic in cosmos module to store this time?

If we don't want to change the cosmos module, we'll end up making relayer stateful. Then each node's orchestrator may have different timeout for a given token type (I think). How do we make sure that we're not sending too many non-profitable batches?

We'll have to somehow synchronize the states of different orchestrators to make sure every node has same next_send_batch_time for a given token type.

thomas-nguy commented 3 years ago

Yes we dont want to modify anything on the module. All the logic needs to be done on the relayer side.

It is fine for other relayer to have different logic since its an open market.

To avoid to make the relayer stateful, how about instead of using time, using height or nonce? If we knows that batch is constructed every 10 blocks and the average block time, we can compute what is the next batch (height or nonce) that can be eligible to be send independently of the cost.

Otherwise we probably need a DB or some share memory to sync our relayers, if we want to manage multiple instances (that would also prevent them to try to send multiple time the same batch and lose gas fee as only one will be accepted)

devashishdxt commented 3 years ago

To avoid to make the relayer stateful, how about instead of using time, using height or nonce?

I don't think this makes any difference. If we store any value on relayer, we're making it stateful and different nodes can have different values for this unless we synchronize this using a central DB (which I don't think is a good idea for decentralized systems).

Any comments @yihuang @tomtau.

thomas-nguy commented 3 years ago

just to clarify the requirement.

This pricing mechanism will be only unique for "our relayer". It is not absurd to use a central DB to synchronise our relayer instances (if we want to manage multiple to ensure better reliability). At the end you can consider that it is only one unique "scalable" relayer.

Anyone can run their own instance of relayer implementing complete different logic. It is still decentralized

yihuang commented 3 years ago

To avoid to make the relayer stateful, how about instead of using time, using height or nonce?

I don't think this makes any difference. If we store any value on relayer, we're making it stateful and different nodes can have different values for this unless we synchronize this using a central DB (which I don't think is a good idea for decentralized systems).

Any comments @yihuang @tomtau.

Technically we only need at least one orchestrator to deliver the message, we could even only run one real orchestrator/relayer with several dummy ones, to avoid contention at all. So I think it should be ok to not synchronize the state.

devashishdxt commented 3 years ago

To avoid to make the relayer stateful, how about instead of using time, using height or nonce?

I don't think this makes any difference. If we store any value on relayer, we're making it stateful and different nodes can have different values for this unless we synchronize this using a central DB (which I don't think is a good idea for decentralized systems). Any comments @yihuang @tomtau.

Technically we only need at least one orchestrator to deliver the message, we could even only run one real orchestrator/relayer with several dummy ones, to avoid contention at all. So I think it should be ok to not synchronize the state.

Ok

tomtau commented 3 years ago

I think the original problem is pretty complex (besides the difficulties mentioned in the original posting, such as estimating gas costs and resilience to infrastructure faults) -- especially with regards to treasury management of arbitrary tokens (what is "profitable" at the time of submitting the batch may not be profitable a few moments later).

So, I'd suggest closing this for now and opening two issues for slightly more manageable problems:

being able to bind costs of unprofitable messages: have some soft-guarantee time-range when batches are submitted, so one can know how long the orchestrator can operate with existing funds... so the main difficulty would be around gas estimates
have an optional "fast-pass" for selected tokens: have a way to configure that batches would be submitted earlier for chosen tokens (where one assumes the price volatility is manageable), given some simplistic notion of "profitability" (e.g. in that pseudocode https://github.com/crypto-org-chain/cronos/issues/99#issuecomment-926281631 )

The second problem would likely need gas cost estimates too, so perhaps starting with the first one and then building on top of it later?

yihuang commented 3 years ago

The second problem would likely need gas cost estimates too, so perhaps starting with the first one and then building on top of it later?

I've suggested @devashishdxt take the parameters from the config file, possibly:

estimated gas of batch execution, possibly constants for each token
current eth gas price
token eth exchange rate

And leave the task of updating these parameters to external programs.

crypto-org-chain / cronos

Problem: gravity-bridge orchestrators don't reject unprofitable messages from cronos to ethereum #99