google-deepmind / functa

Apache License 2.0
143 stars 7 forks source link

Latent Modulation Implementation #14

Closed alvinsunyixiao closed 1 year ago

alvinsunyixiao commented 1 year ago

Hi,

I have a question about latent modulation. The paper says that the shift modulation at each layer comes from a linear map on the latent vector. However, the code seems to map latent vector through a ReLU MLP and extract the intermediate layer output as the shift modulation for the original network.

Is this a typo in the paper? Or is it a new type of modulation that works better?

hyunjik11 commented 1 year ago

Hi, note that for the line you linked: self.forward = hk.nets.MLP(self.layer_sizes + (self.output_size,), activation=activation) the self.layer_sizes is set to (), an empty tuple here in the experiment. Hence this mapping is just a linear map in practice. We found that using an MLP with more layers (i.e. non-empty layer_sizes) actually led to worse performance, rather counter-intuitively. We suspect this is an optimization/initialization issue, but didn't get round to resolving it. If you can find a way to make it work better with more layers, that would be a nice contribution! Hope that helps :)

alvinsunyixiao commented 1 year ago

Thank you very much for the explanation! That makes sense!