Closed SeanMcOwen closed 1 month ago
@SeanMcOwen A first cut can be the risk-neutral linear payoff myopic miner, i.e. one who chooses the higher of:
This purely myopic miner doesn't care about the lockup period and acts as if they would receive their reward immediately.
@jshorish When you say spot conversion rate do you mean on the open market or the one that the system offers.
Let me know if I misunderstood or if for this second policy option you want to be adding in the lockup period and decision making there given I think that what you might be describing is the first policy option that we have currently.
@SeanMcOwen The spot conversion rate is the one the protocol offers (the market making desk). Also just to note that this isn't exactly the same as the ratio of the proposed rewards, under our specification of the conversion rate (see the HackMD section on "Objective" discussing the specification) that was signed off by Quai in our meeting last Monday (29th July).
So the myopic miner's reward flow is the following:
A more realistic miner would:
and so, when deciding which token to receive as reward, the miner will:
This more realistic miner requires more "machinery" to model--we'd likely parameterize and then generate a population distribution across parameters for each miner...
@jshorish Ah, I missed that the actual exchange rate is different by the 1/ln(H) term, that changes things. Then this is good to be queued up and will be done as follows:
Thanks as always for the very detailed answers!
@SeanMcOwen Looks good! And good call also to keep the old version, as from yesterday's (Aug. 8th) discussion that's something Zargham would like to show, i.e. how our controller setup fares against the version they came up with (where the conversion rate is the ratio of the proposed block rewards).
In what follows: $$ c_i = \begin{cases} 1 & \text{if token 1 is chosen} \ 0 & \text{if token 2 is chosen}. \end{cases} $$
Miner choices $c_i$ are assumed to be independently distributed such that for a block at height $i$, $$ p_i = \Pr(ci = 1 | r{i1}, r_{i2}, d_i ) := \frac{1}{1 + \exp(- \pmb{\beta}'\mathbf x_i) }, $$ where $\mathbf x_i$ is a set of features and $\pmb \beta$ their associated weights. It may be that the first such feature is $1$, so that the first weight is an intercept/'bias' term. Note that the linear term $\pmb{\beta}' \mathbf x$ is consistent with an interpretation of the above as coming from a latent variable/random utility model of the miner.
Given the data set $z_k$, maximum likelihood estimation yields estimates $\hat{\pmb{\beta}}$.
The controller seeks to stabilize an imputed value of hashpower (difficulty) by adjusting the proposed block rewards so that the miner would have been indifferent between receiving an award in qi (token 1) or quai (token 2). The interpretation of this is that deviations from indifference reveals that one token is more valuable than the other. In the case that one token (qi) is to reflect the value of hashpower (difficulty), indifference is a reference or focal point from which the value of hashpower may be observed from miner decisions.
Indifference is when $p_i = 0.5$. Given $\hat{\pmb{\beta}}$, it is clear that the invariant surface of features satisfies $$ \hat{\pmb{\beta}}' \mathbf x \equiv 0. $$
Refining this further requires a definition of the features $\mathbf x$.
The simplest example is where $\mathbf x_i = (1, x_i) := (1, d_i/\log_2(d_i))$. In this case the invariant surface above yields a value $d_i = d^\star$ such that $$ \frac{d^\star}{\log_2(d^\star)} = -\frac{\hat{\beta_0}}{\hat{\beta_1}}. $$
This is the difficulty level that would have to obtain in order for a miner to be (on average) indifferent between selecting token 1 and token 2. In this case define $x^\star(\hat{\pmb{\beta}}) = d^\star / \log_2(d^\star)$ (we will sometimes drop the dependence of $x^\star$ upon $\hat{\pmb{\beta}}$ for brevity in what follows, but it is important always to recall that $x^\star$ is derived from the estimation problem the controller performs in finding a miner's indifference point).
[It is worth noting here that provided $d_i > e$, $\frac{dx_i}{d(d_i)} > 0$, i.e. increasing difficulty $d_i$ will increase $x_i$ and hence increase $p_i$ from the logistic expression above. There is thus a weak restriction on $d_i$ under this approach.]
@SeanMcOwen Just to quickly note that for the miner behavior here, when the feature dataset to be estimated is made up of pairs $(1, x_i)$ as in the simple example, the parameter vector $\boldsymbol{\beta} = (\beta_0, \beta_1)$ has the following restrictions that should be incorporated when setting the "truth" for the miner's selection process:
These are restrictions that are necessary for the estimated parameters to maintain a consistent sign for $x^\star(\hat{\boldsymbol{\beta}}) > 0$.
(Alternatively the feature dataset could be made up of pairs $(-1, x_i)$ with $\beta_0, \beta_1 > 0$--these are equivalent representations for the indifference point of the distribution.)
Just need to confirm the actual form of it, i.e. if it is just a stream of decisions computed before the simulation and fed in or what the rule is