PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.83k stars 1.28k forks source link

Chapter07/lib/common.py - definitions for a newbie #48

Open jeffCabrera0321 opened 5 years ago

jeffCabrera0321 commented 5 years ago

Hello all, I am still very much in the beginning of my journey into ML/DL/RL, and I am having some issue with some of the terminology. Specifically in Chapter 7/common.py. Everything up to this point is very clear. I do not understand what is the meaning of b_j, tz_j, u(I assume upper), l(I assume lower). As in what is the definition of b_j, tz_j, u, l. Can someone reference what these means. I know these are just variable names but I am curious to know.

for atom in range(n_atoms): tz_j = np.minimum(Vmax, np.maximum(Vmin, rewards + (Vmin + atom * delta_z) * gamma)) b_j = (tz_j - Vmin) / delta_z l = np.floor(b_j).astype(np.int64) u = np.ceil(b_j).astype(np.int64) eq_mask = u == l proj_distr[eq_mask, l[eq_mask]] += next_distr[eq_mask, atom] ne_mask = u != l proj_distr[ne_mask, l[ne_mask]] += next_distr[ne_mask, atom] * (u - b_j)[ne_mask] proj_distr[ne_mask, u[ne_mask]] += next_distr[ne_mask, atom] * (b_j - l)[ne_mask]