ben-eysenbach / mnm

Code to accompany the paper "Mismatched No More: Joint Model-Policy Optimization for Model-Based RL"
20 stars 2 forks source link

Lemma 2 Derivation #1

Open gunnxx opened 8 months ago

gunnxx commented 8 months ago

Hi, thanks for the great paper and the reproducible code! I am trying to follow through the derivation and can't quite follow the jump in Lemma B.2. I wonder if you can elaborate more on the "???" part? Thanks in advance! image

ben-eysenbach commented 8 months ago

Thanks for the kind words! We're using $p(h)$ to denote a geometric distribution: $p(h) = (1 - \gamma) \gamma^h$. So, $\sum_h p(t) r_h = (1 - \gamma) \sum_h \gamma^h r_h$. Dividing both sizes by $1 - \gamma$ gives the ``???''

Quick note: I think the second "=" under ``Lemma 2'' in the screenshot above should also have a $\frac{1}{1 - \gamma}$ out in front.

Hope this helps!