Hello all, I am still very much in the beginning of my journey into ML/DL/RL, and I am having some issue with some of the terminology. Specifically in Chapter 7/common.py. Everything up to this point is very clear. I do not understand what is the meaning of b_j, tz_j, u(I assume upper), l(I assume lower). As in what is the definition of b_j, tz_j, u, l. Can someone reference what these means. I know these are just variable names but I am curious to know.
for atom in range(n_atoms):tz_j = np.minimum(Vmax, np.maximum(Vmin, rewards + (Vmin + atom * delta_z) * gamma))b_j = (tz_j - Vmin) / delta_zl = np.floor(b_j).astype(np.int64)u = np.ceil(b_j).astype(np.int64)eq_mask = u == lproj_distr[eq_mask, l[eq_mask]] += next_distr[eq_mask, atom]ne_mask = u != lproj_distr[ne_mask, l[ne_mask]] += next_distr[ne_mask, atom] * (u - b_j)[ne_mask]proj_distr[ne_mask, u[ne_mask]] += next_distr[ne_mask, atom] * (b_j - l)[ne_mask]
Hello all, I am still very much in the beginning of my journey into ML/DL/RL, and I am having some issue with some of the terminology. Specifically in Chapter 7/common.py. Everything up to this point is very clear. I do not understand what is the meaning of b_j, tz_j, u(I assume upper), l(I assume lower). As in what is the definition of b_j, tz_j, u, l. Can someone reference what these means. I know these are just variable names but I am curious to know.
for atom in range(n_atoms):
tz_j = np.minimum(Vmax, np.maximum(Vmin, rewards + (Vmin + atom * delta_z) * gamma))
b_j = (tz_j - Vmin) / delta_z
l = np.floor(b_j).astype(np.int64)
u = np.ceil(b_j).astype(np.int64)
eq_mask = u == l
proj_distr[eq_mask, l[eq_mask]] += next_distr[eq_mask, atom]
ne_mask = u != l
proj_distr[ne_mask, l[ne_mask]] += next_distr[ne_mask, atom] * (u - b_j)[ne_mask]
proj_distr[ne_mask, u[ne_mask]] += next_distr[ne_mask, atom] * (b_j - l)[ne_mask]