elliotchanesane31 / RIS

50 stars 9 forks source link

Implementation of some expectation is different from paper #2

Open XinyuWang2 opened 2 years ago

XinyuWang2 commented 2 years ago

Hi, I find some expectation is not calculated the way written in the paper. I know some expectations are impossible to calculate because of infinite possibilities. Did you try a sampling method and compare it with the way implemented? I am really curious about the difference.

E.g. When calulating A,

elliotchanesane31 commented 2 years ago

Hi, you are right, there is a small difference between the paper derivation and the actual implementation to compute A.

In the paper, the advantage A(sg | s, g) = E_{sg_hat ~ piH(.|s, g)} [C(sg_hat|s, g)] – C(sg |s, g) is the difference of two terms, the first being an expectation with respect to piH(.|s, g). In principle, we should average C(sg_hat |s, g) for multiple subgoals sg_hat sampled from piH(.|s, g) to approximate this expectation, which could be computationally expensive depending on the number of samples we choose.

In practice, we found that using only the mean of piH(.|s, g) for sg_hat was simpler and faster as well as more stable than a sampling based approximation, although I haven’t looked into this in depth.