Can you help explain line 117~120 in GAIL_OppositeV4.py?

Bigpig4396 / PyTorch-Generative-Adversarial-Imitation-Learning-GAIL

7 stars 2 forks source link

Can you help explain line 117~120 in GAIL_OppositeV4.py? #1

Closed alanyuwenche closed 3 years ago

alanyuwenche commented 3 years ago

Thanks for sharing. Can you help explain line 117~120 in GAIL_OppositeV4.py?

    a1_loss = torch.FloatTensor([0.0])
    for t in range(T):
        a1_loss = a1_loss + fake_reward * log_pi_a1_list[t]
    a1_loss = -a1_loss / T

I tried your code on MountainCar-v0, that is why I ask this question. These two environments should be very similar, including box observation and discrete actions. I have struggled more than 2 weeks, however, the car still can’t reach goal position in 200 iterations. Thanks in advance.

Bigpig4396 commented 3 years ago

We say the expert behavior is 'exp', learnt behavior is 'real', if 'real' is quite alike 'exp', we believe it is imitating well. If you input 'real' to discriminator, it will output '1' if good imitated or '0' if imitation is bad. So this is the fake_reward, fake_reward = disc('real'). So fake_reward is like a weight to actor loss, if well imitated, loss = log_pi, the 'real' behavior is reinforced. Otherwise, fake_reward is 0, means it does not want to learn current 'real', because it is bad.

Bigpig4396 commented 3 years ago

GAIL is not something reinforcement learning, you should first run some other RL algorithm on your environment with discrete action space to learn on your problem. After that, you will get a trajectory of state and action. GAIL is used to mimicry such trajectory, not learning from scratch.

alanyuwenche commented 3 years ago

Thanks for your reply. I would check the code again.