aviralkumar2907 / CQL

Code for conservative Q-learning
407 stars 70 forks source link

About the readability #3

Open familyld opened 3 years ago

familyld commented 3 years ago

Hi Aviral,

In the paper, you claim CQL can be implemented with less than 20 lines of code, but it's really difficult to identify these "20 lines of code" from the current version of your project which is built upon other projects. Would you please point out which part of code exactly corresponds to the core of CQL? I really like your idea of CQL, both the theoretical part and its simplicity, but currently, it seems very hard to follow.

Best, Zhi-Hong

familyld commented 3 years ago

For the Atari experiments, it seems like line 221 ~ 243 in quantile_agent.py are of interest.

For the D4RL experiments, it seems like line 233 ~ 300 in cql.py are of interest, but more lines of code are modified or added, comparing to sac.py.

Since the two experiments are based on different methods (QR-DQN and SAC) and different libraries (dopamine and rlkit), the detailed implementations are quite different. It would be great if the author can add more comments and explicitly correspond the code to the equations in the paper, e..g., Equation (4).

aviralkumar2907 commented 3 years ago

Hello @familyld,

Yes, I hope to add some more comments in about a month timeline. Sorry for the delay in doing that.