CherryPieSexy / learn_to_move

4 stars 0 forks source link

Questions about n-steps in SAC. #1

Closed wayunderfoot closed 4 years ago

wayunderfoot commented 4 years ago

I am quite interested in your algorithm and could you tell me how to consider entropy in n-steps sac,thanks.

CherryPieSexy commented 4 years ago

I am very glad you are interested in my solution! In n-steps sac critic loss entropy is treat as if it regular reward component. For example, in state s actions (a_0, a_1, ..., a_n) was sampled during exploration and corresponding log-probabilities was (p_0, p_1, ..., p_n), scalar product R_sum of log-probs (starting from second term) with discounting vector (g, g 2, ..., g n) calculated and target for Q(s, a_0) is R_sum + Q(s_n, a_n)

wayunderfoot commented 4 years ago

Hello,i am learning model-free reinforcement learning.A few days ago i implement it using single expericent tuple, but the effect is not very good,i would appreciate it if you tell me how to implement n-step sac specifically how to store expericece in replay buffer and how to sample from it.Also which way do you use in your per,rank-based or propotional based?

CherryPieSexy commented 4 years ago

Sorry for the delayed responce. The repository will be updated soon, and all code will be available within one or two weeks. We used same prioritization as in R2D2 paper, proportional that is.

wayunderfoot commented 4 years ago

Thanks for your help