Build second policy option for behavioral mining

SeanMcOwen commented 2 months ago

Just need to confirm the actual form of it, i.e. if it is just a stream of decisions computed before the simulation and fed in or what the rule is

jshorish commented 2 months ago

@SeanMcOwen A first cut can be the risk-neutral linear payoff myopic miner, i.e. one who chooses the higher of:

The proposed block reward in (say) Qi, and
The (proposed block reward in Quai) x (the spot conversion rate from Quai to Qi)

This purely myopic miner doesn't care about the lockup period and acts as if they would receive their reward immediately.

SeanMcOwen commented 2 months ago

@jshorish When you say spot conversion rate do you mean on the open market or the one that the system offers.

If it is the one that the system offers then these two will be equal I believe because the spot rate == the ratio of latest proposed rewards
If it is the market rate, then this would be what the current first policy option offers

Let me know if I misunderstood or if for this second policy option you want to be adding in the lockup period and decision making there given I think that what you might be describing is the first policy option that we have currently.

jshorish commented 2 months ago

@SeanMcOwen The spot conversion rate is the one the protocol offers (the market making desk). Also just to note that this isn't exactly the same as the ratio of the proposed rewards, under our specification of the conversion rate (see the HackMD section on "Objective" discussing the specification) that was signed off by Quai in our meeting last Monday (29th July).

So the myopic miner's reward flow is the following:

The protocol proposes rewards of qi and quai for the successfully mined block;
The miner compares the proposed qi to the value of the proposed quai in terms of qi that it can obtain from the conversion rate--this is the myopia, as they forecast the future conversion rate to be the current conversion rate, ignore the time value of money when a conversion is locked up, and assume the future secondary market rate will be the same as the future conversion rate;
The miner accepts the token with the higher value as actual reward r, which is minted and then locked for a specified period (according to Quai's supplied lockup table);
After that time the actual reward r is unlocked and given to the miner. The miner may then do whatever it wants with that reward amount.

A more realistic miner would:

Forecast a conversion rate that will obtain after the block reward lockup ends;
Forecast the secondary market rate that will obtain after the block reward lockup ends;
Forecast the opportunity cost of (after receiving the block reward) converting one token to another, and then having that conversion locked for some duration;

and so, when deciding which token to receive as reward, the miner will:

Compare 1) the forecasted net proceeds from conversion against the forecasted proceeds of a secondary market sale, taking the higher, and then 2) compare that value to the face value of the other token, and 3) make their decision of which token to receive accordingly.

This more realistic miner requires more "machinery" to model--we'd likely parameterize and then generate a population distribution across parameters for each miner...

SeanMcOwen commented 2 months ago

@jshorish Ah, I missed that the actual exchange rate is different by the 1/ln(H) term, that changes things. Then this is good to be queued up and will be done as follows:

Do #137 where I will add the policy option for the exchange rate that utilizes this form proposed (will keep old version if we ever want to A/B test it)
Implement this version of the myopic miner (will keep older version as well for the same A/B testing stuff)

Thanks as always for the very detailed answers!

jshorish commented 2 months ago

@SeanMcOwen Looks good! And good call also to keep the old version, as from yesterday's (Aug. 8th) discussion that's something Zargham would like to show, i.e. how our controller setup fares against the version they came up with (where the conversion rate is the ratio of the proposed block rewards).

SeanMcOwen commented 1 month ago

In what follows: $$ c_i = \begin{cases} 1 & \text{if token 1 is chosen} \ 0 & \text{if token 2 is chosen}. \end{cases} $$

Miner choices $c_i$ are assumed to be independently distributed such that for a block at height $i$, $$ p_i = \Pr(ci = 1 | r{i1}, r_{i2}, d_i ) := \frac{1}{1 + \exp(- \pmb{\beta}'\mathbf x_i) }, $$ where $\mathbf x_i$ is a set of features and $\pmb \beta$ their associated weights. It may be that the first such feature is $1$, so that the first weight is an intercept/'bias' term. Note that the linear term $\pmb{\beta}' \mathbf x$ is consistent with an interpretation of the above as coming from a latent variable/random utility model of the miner.

Given the data set $z_k$, maximum likelihood estimation yields estimates $\hat{\pmb{\beta}}$.

Objective: stability via indifference

The controller seeks to stabilize an imputed value of hashpower (difficulty) by adjusting the proposed block rewards so that the miner would have been indifferent between receiving an award in qi (token 1) or quai (token 2). The interpretation of this is that deviations from indifference reveals that one token is more valuable than the other. In the case that one token (qi) is to reflect the value of hashpower (difficulty), indifference is a reference or focal point from which the value of hashpower may be observed from miner decisions.

Indifference is when $p_i = 0.5$. Given $\hat{\pmb{\beta}}$, it is clear that the invariant surface of features satisfies $$ \hat{\pmb{\beta}}' \mathbf x \equiv 0. $$

Refining this further requires a definition of the features $\mathbf x$.

A simple example

The simplest example is where $\mathbf x_i = (1, x_i) := (1, d_i/\log_2(d_i))$. In this case the invariant surface above yields a value $d_i = d^\star$ such that $$ \frac{d^\star}{\log_2(d^\star)} = -\frac{\hat{\beta_0}}{\hat{\beta_1}}. $$

This is the difficulty level that would have to obtain in order for a miner to be (on average) indifferent between selecting token 1 and token 2. In this case define $x^\star(\hat{\pmb{\beta}}) = d^\star / \log_2(d^\star)$ (we will sometimes drop the dependence of $x^\star$ upon $\hat{\pmb{\beta}}$ for brevity in what follows, but it is important always to recall that $x^\star$ is derived from the estimation problem the controller performs in finding a miner's indifference point).

[It is worth noting here that provided $d_i > e$, $\frac{dx_i}{d(d_i)} > 0$, i.e. increasing difficulty $d_i$ will increase $x_i$ and hence increase $p_i$ from the logistic expression above. There is thus a weak restriction on $d_i$ under this approach.]

jshorish commented 1 month ago

@SeanMcOwen Just to quickly note that for the miner behavior here, when the feature dataset to be estimated is made up of pairs $(1, x_i)$ as in the simple example, the parameter vector $\boldsymbol{\beta} = (\beta_0, \beta_1)$ has the following restrictions that should be incorporated when setting the "truth" for the miner's selection process:

$\beta_0 < 0$;
$\beta_1 > 0$.

These are restrictions that are necessary for the estimated parameters to maintain a consistent sign for $x^\star(\hat{\boldsymbol{\beta}}) > 0$.

(Alternatively the feature dataset could be made up of pairs $(-1, x_i)$ with $\beta_0, \beta_1 > 0$--these are equivalent representations for the indifference point of the distribution.)

dominant-strategies / Quai-Macro-Model

Build second policy option for behavioral mining #122

Objective: stability via indifference

A simple example