Open husha1993 opened 6 years ago
It can be any number, 10 worked well. As we explain in the paper, there's no requirement that the number of actions in the MDP that we learn (and do VI over) matches the number of actions in the real MDP.
On Mon, Apr 2, 2018 at 12:32 AM, Sha Hu notifications@github.com wrote:
Hi, https://github.com/avivt/VIN/blob/fe11bb1ae8ad9bcb3a02e2cc5e21b9 499ccf0db4/vin.py#L30 why l_q equals 10 rather than 8? Thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/avivt/VIN/issues/13, or mute the thread https://github.com/notifications/unsubscribe-auth/AOeQNS2DXTKvo_1at8y3zrAgUlADTIVwks5tkdQagaJpZM4TDP7g .
Hi, https://github.com/avivt/VIN/blob/fe11bb1ae8ad9bcb3a02e2cc5e21b9499ccf0db4/vin.py#L30 why l_q equals 10 rather than 8? Thanks