autonomys / subspace

Subspace Network reference implementation
https://subspace.network
369 stars 242 forks source link

The operator reward is delayed and may not distribute to the original set of nominators #2415

Open NingLin-P opened 8 months ago

NingLin-P commented 8 months ago

Currently, on both main branch and incoming staking 1.1, the operator reward of a given domain block is delayed until the ER of that block is confirmed before distributing. After the ER is confirmed the reward will be added to the operator's current staking pool, which increases the share price of the pool, hence increase the total amount that the nominator can withdraw.

While the reward is delayed, we are adding the reward to the operator's current staking pool, it may lead to issues:

cc @vedhavyas @dariolina

dariolina commented 8 months ago

Added a note to the domains spec which we should update if we address this issue.

ParthDesai commented 6 months ago

@NingLin-P @dariolina Please correct me if I wrong, but I do not see the inconsistency here.

When a nominator requests to withdraw N amount of shares, its value in terms of SSC is decided by the share price which is not calculated during the withdraw_stake call, but rather when the share price for that epoch for that operator is available and withdraw in shares is converted to balance.

The operator reward is added after the block is confirmed into current_epoch_rewards<key=operator_id>. When the epoch boundary block is confirmed, the reward is taken into account to calculate the share price for that epoch. This would mean that the share price used to convert shares to balance includes the reward.

NingLin-P commented 6 months ago

It is correct but is not the issue here.

Suppose there is an operator with 100 SSC of total stake and a nominator with 50 SSC stake, the ER of the domain block #10 just submitted which contains 10 SSC of execution fee, as expected the nominator should receive 5 SSC as reward (without considering nomination tax) but the execution fee only distribute after the ER is confirmed (in block #10 + challenge period), if before the ER is confirmed:

as a result:

dariolina commented 6 months ago

@ParthDesai it's easy to see if you imagine the nominator withdraws all shares at epoch n. The rewards that count into the share price at epoch n are the rewards for blocks which were confirmed during this epoch, so blocks up to last block of epoch n - 14400. Then in the epoch n+1 this nominator will not receive any rewards for those last 14400 blocks even though he was staking when they were produced.

ParthDesai commented 6 months ago

@NingLin-P I went back and read the code once more. Just for my understanding the first case where the reward gets diluted happens because we calculate the share price for particular epoch when it is finalized and all deposits are treated the same (even if one was active far longer than the other like our example.)

I do have some difficulty in understanding the second case. From my understanding where the nominator does not receive the reward. Is this because during withdraw_stake extrinsic, we remove its share from the deposit for that particular epoch, but even in that case that withdrawed shares will be part of WithdrawalInShare and when epoch is finalized, the share will have some value assigned. This means that nominator will receive some reward.

Is it possible for you to explain the second case with concrete example?

ParthDesai commented 6 months ago

@ParthDesai it's easy to see if you imagine the nominator withdraws all shares at epoch n. The rewards that count into the share price at epoch n are the rewards for blocks which were confirmed during this epoch, so blocks up to last block of epoch n - 14400. Then in the epoch n+1 this nominator will not receive any rewards for those last 14400 blocks even though he was staking when they were produced.

From my understanding, epoch related algorithms are only used in the code for confirmed block and I do not think there is an explanation of how epoch transition affects current blocks. (I am referring to current blocks as head domain block). if possible, can you point me to relevant section in specification? Or briefly explain it to me here?

NingLin-P commented 6 months ago

I think the confusion about epoch comes from the fact that we are using the confirmed domain number to trigger epoch transition: https://github.com/subspace/subspace/blob/81004be3156484fdb75f277d6f96d5a8a5ac68a0/crates/pallet-domains/src/lib.rs#L922-L927

What we want just for the epoch transition to happen once every StakeEpochDuration number of blocks, we are using the confirmed domain block number now but it is also okay to use the latest domain block number.

Suppose the challenge period is 100 blocks and the StakeEpochDuration is 10 blocks, the current block is #105 and there is an operator with 100 SSC of total stake and a nominator with 50 SSC stake:

  1. In block #106 a domain user gives a huge tip resulting in the #106 ER with 10 SSC of execution fee
  2. The nominator withdraws all in block #107
  3. In block #110, epoch transition happens
    • The block #1..#10 is confirmed during this epoch and their reward is added to the staking pool in block #101..#110
    • The share price of this epoch is calculated according to the staking pool
    • This share price is what is used for the deposit/withdraw that happens during this epoch, which includes the nominator's withdrawal
      • Thus the final amount of SSC the nominator can receive is determined, which doesn't contain the reward of block #106
  4. In block #206, the block #106 is confirmed, the reward is added to the staking pool.
  5. In block #210, epoch transition happens ...
ParthDesai commented 6 months ago

@NingLin-P My thoughts on this matter and solution outline is as follows: At the moment, we use block confirmation to trigger epoch transition for non-confirmed blocks. This is confusing and has some issues associated with it.

  1. Transition for 1st epoch for domain has abnormal length of confirmation period + StakeEpochDuration while other epochs has StakeEpochDuration. Ideally all epoch should have same length.
  2. Epoch index corresponds to the confirmed block index (i.e Confirmed block index = StakeEpochDuration * Epoch index) while we would expect the index to corresponds to the current blocks.
  3. Some operations like reward distributions applied to the different set of nominators.

Suggestion: One solution could be to track current epoch index with epoch marked as confirmed. And update/calculate share price for particular epoch after it is confirmed. There would be constant amount of storage required for this data structure but nothing too substantial.

This could solve first two problems described above are no longer problem. This would translate to withdrawal taking longer (I.e challenge period + the epoch duration) which would still be pretty less than unbonding time for polkadot and cosmos (28 days and 21 days respectively). One disadvantage here would be PendingDeposit also need same amount of time to covert to KnownDeposit but I guess if challenge period is 1 day it should be acceptable UX.

Regarding other nominators getting shares, this can also be resolved using this solution since the withdrawal will also be done after share price is determined and deposits will be in effect on correct epoch boundary.

Disadvantage: Apart from the storage cost, it might be possible that the epoch transition confirmation (when the block at epoch boundry is confirmed) could become more expensive. I am still investigating here.

NingLin-P commented 6 months ago

Transition for 1st epoch for domain has abnormal length of confirmation period + StakeEpochDuration while other epochs has StakeEpochDuration. Ideally all epoch should have same length.

This problem can be fixed by using the latest domain block number HeadDomainNumber to trigger epoch transition, also see https://github.com/subspace/subspace/issues/1750#issuecomment-1961476348.

Epoch index corresponds to the confirmed block index (i.e Confirmed block index = StakeEpochDuration * Epoch index) while we would expect the index to corresponds to the current blocks.

This is not really an issue since we don't have any usage to convert from epoch index to block number.

One solution could be to track current epoch index with epoch marked as confirmed. And update/calculate share price for particular epoch after it is confirmed.

This means the share price of a particular epoch is only available after it is confirmed, this is not desired because we may need the share price in the very next epoch for the deposit/withdraw request for the following usage:

Currently, each nominator at most keeps 2 items for all its deposits: known for deposits that have converted into shares based on the share price and pending for deposits that happen in the current epoch where the share price is unknown yet. If the share price is only available after the challenge period then we may need to keep at most StakeEpochDuration/challenge period = 100/14_400 = 144 entries in pending for each nominator. Though it just costs more storage and is not a fundamental issue. cc @vedhavyas

Withdraw takes the amount of share as input, we need to check if the nominator does have such amount of share before handling the withdrawal for it. While the pending deposit is only converted to known share after the challenge period, it means any deposit is only available to withdraw after 1 challenge period (and another challenge period for the unlocking period). This is a UX change cc @jfrank-summit.

ParthDesai commented 6 months ago

Currently, each nominator at most keeps 2 items for all its deposits: known for deposits that have converted into shares based on the share price and pending for deposits that happen in the current epoch where the share price is unknown yet. If the share price is only available after the challenge period then we may need to keep at most StakeEpochDuration/challenge period = 100/14_400 = 144 entries in pending for each nominator. Though it just costs more storage and is not a fundamental issue.

We can limit number of pending deposits to some constant (just like we are limiting it to 1 at the moment).

Withdraw takes the amount of share as input, we need to check if the nominator does have such amount of share before handling the withdrawal for it. While the pending deposit is only converted to known share after the challenge period, it means any deposit is only available to withdraw after 1 challenge period (and another challenge period for the unlocking period). This is a UX change cc @jfrank-summit.

I do think this is acceptable if challenge period is of 1 day. Cosmos and polkadot have unbonding time > 20 days. So, this should not be an issue.

dariolina commented 6 months ago

For the withdrawal no objections here. For mainnet Todd was suggesting unlocking of 7 days even. So this could be 1 day for challenge period + 6 days of waiting.