Closed alvinsunyixiao closed 1 year ago
Hi, note that for the line you linked:
self.forward = hk.nets.MLP(self.layer_sizes + (self.output_size,), activation=activation)
the self.layer_sizes
is set to ()
, an empty tuple here in the experiment.
Hence this mapping is just a linear map in practice. We found that using an MLP with more layers (i.e. non-empty layer_sizes
) actually led to worse performance, rather counter-intuitively. We suspect this is an optimization/initialization issue, but didn't get round to resolving it. If you can find a way to make it work better with more layers, that would be a nice contribution! Hope that helps :)
Thank you very much for the explanation! That makes sense!
Hi,
I have a question about latent modulation. The paper says that the shift modulation at each layer comes from a linear map on the latent vector. However, the code seems to map latent vector through a ReLU MLP and extract the intermediate layer output as the shift modulation for the original network.
Is this a typo in the paper? Or is it a new type of modulation that works better?