ghliu / pytorch-ddpg

Implementation of the Deep Deterministic Policy Gradient (DDPG) using PyTorch
Apache License 2.0
569 stars 157 forks source link

error in compute target q values #6

Closed hejia-zhang closed 6 years ago

hejia-zhang commented 6 years ago
        target_q_batch = to_tensor(reward_batch) + \
            self.discount*to_tensor(terminal_batch.astype(np.float))*next_q_values

I think it should be

        target_q_batch = to_tensor(reward_batch) + \
            self.discount*to_tensor(1.0 - terminal_batch.astype(np.float))*next_q_values
abalakrishna123 commented 6 years ago

yeah... that does seem like it would be the case

songanz commented 6 years ago

I think in the memory.py, he has:

    for e in experiences:
        state0_batch.append(e.state0)
        state1_batch.append(e.state1)
        reward_batch.append(e.reward)
        action_batch.append(e.action)
        terminal1_batch.append(0. if e.terminal1 else 1.)