PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.83k stars 1.28k forks source link

Chapter 7: What is the use of eq_mask? #21

Closed gowtham1997 closed 5 years ago

gowtham1997 commented 5 years ago

Hello, Thanks for the book. I'm having fun reading and implementing different DQN models.

The below snippet is slightly confusing to me.

https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/2e171a12b347bdefc72a77b11753975198b7c8d1/Chapter07/lib/common.py#L174-L178

1)Am I correct in assuming we are trying to get index of values which fall directly on the atom(l == u) and have their dones set to True? 2) Can you also please explain line 176(the Boolean tensor taking a mask of itself is slightly confusing)?

Thanks

Shmuma commented 5 years ago

Hi!

1)Am I correct in assuming we are trying to get index of values which fall directly on the atom(l == u) and have their dones set to True?

Yes, you're right. During projection, we calculate positions of shifted atoms on old support. If we fall somewhere between old supports, we need to distribute weights, but if some projections landed exactly at old supports, we don't need to distribute anything, just copy weight to those positions.

Piece of code you've quoted correspond to branch where we handle batch cases with done=True. For those samples, probability distribution is just 1 at support which correspond to obtained reward.

2) Can you also please explain line 176(the Boolean tensor taking a mask of itself is slightly confusing)?

This expression is a bit silly way to get True only where eq_maks == True && dones == True Probably, it could be written better, but it already took me some time to obtain the result I expected :).

To visualize the effect of this distribution projection function, you can check my adhoc tool: Chapter07/adhoc/distr_test.py I propose to step through it in debugger to check states of all masks and resulting distributions.