Hi, I'm trying to make this architecture in pytorch.
I have seen the uniform initialization looks like a kaiming uniform, but I'm not sure. Also, I don't know which initialization you use for each layer (e.g, Linear, CNN, GRU).
Could you say what do you recommend me to use? Depending on the initialization I use, I notice the agent can recover from bad starts or cannot.
Furthermore, in the article you say you're using silu, but there is the mish activation function in your code. Did you try and abandon it?
Hi, I'm trying to make this architecture in pytorch.
I have seen the uniform initialization looks like a kaiming uniform, but I'm not sure. Also, I don't know which initialization you use for each layer (e.g, Linear, CNN, GRU).
Could you say what do you recommend me to use? Depending on the initialization I use, I notice the agent can recover from bad starts or cannot.
Furthermore, in the article you say you're using silu, but there is the mish activation function in your code. Did you try and abandon it?