BY571 / IQN-and-Extensions

PyTorch Implementation of Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning with additional extensions like PER, Noisy layer, N-step bootstrapping, Dueling architecture and parallel env support.
MIT License
80 stars 16 forks source link

Some questions on #4

Closed 844015539 closed 4 years ago

844015539 commented 4 years ago

Hello,Dittert I'm sorry to disturb you .I wanna ask you some questions about your code. 1.what is the meaning of "Munchausen RL", I search the net but I can't find something about it. IF I just start a normal experment with noisy_iqn_per, shall I choose "if not self.munchausen:" ? 2.I can't open your text "IQN-DQN.ipynb",is something wrong with it? Ich mag Deutschland so sehr . Aber mein Deutschniveau ist nicht gut. Vielen Dank! Mit freundlichen Grüßen Ihr Lin Yuan

BY571 commented 4 years ago

hey @844015539 ,

thank you for your feedback! First, munchausen is a new concept that recently was published. I wrote a medium article about it: Article or you can check out the original Paper. when you start a normal experiment just let munchausen set at 0.

IQN-DQN.ipynb is just a simple notebook for the base algorithm. you probably don't need it but if you want to open it you need to open jupyter notebook

844015539 commented 4 years ago

Hello,Dittert I'm sorry to disturb you .I wanna ask you some questions about your code on "agent". And my doubts are mainly result in "def learn_per(self, experiences):" 1.Q_targets_next = Q_targets_next.gather(2, action_indx.unsqueeze(-1).expand(self.BATCH_SIZE, self.N, 1)).transpose(1,2) Q_expected = Q_expected.gather(2, actions.unsqueeze(-1).expand(self.BATCH_SIZE, self.N, 1)) I want to ask "action_indx", is it wrong? I didn't see your any defination about it in the former code. Moreover, in the " Q_expected ",it turns to be "actions". 2."assert td_error.shape == (BATCH_SIZE, self.N, self.N), "wrong td error shape" " ,Why does "self.N" appear 2 times? I am looking foraward to your reply.

844015539 commented 4 years ago

OH,I am sorry ,it is my fault . I am too careless to see action_indx.

844015539 commented 4 years ago

Hello,Dittert I want to know what is "self.layer_size = layer_size" mean? Does it mean the size of convolution layer's output? I see some code in 《deep reinforcement learning hands on》,Is 'layer_size ' means 'conv_out_size'? I think the following defination will be more prudent and clear. I hope you can answer my question.Thank you!

self.conv = nn.Sequential(
        nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4),
        nn.ReLU(),
        nn.Conv2d(32, 64, kernel_size=4, stride=2),
        nn.ReLU(),
        nn.Conv2d(64, 64, kernel_size=3, stride=1),
        nn.ReLU()
    )

    conv_out_size = self._get_conv_out(input_shape)
    self.fc_adv = nn.Sequential(
        nn.Linear(conv_out_size, 256),
        nn.ReLU(),
        nn.Linear(256, n_actions)
    )
    self.fc_val = nn.Sequential(
        nn.Linear(conv_out_size, 256),
        nn.ReLU(),
        nn.Linear(256, 1)
    )

def _get_conv_out(self, shape):
    o = self.conv(torch.zeros(1, *shape))
    return int(np.prod(o.size()))
BY571 commented 4 years ago

hey @844015539,

layer_size is the number of neurons in a linear layer. In the readme you can read about it as well... : Size of the hidden layer, default=512