Closed bfs18 closed 3 years ago
Thanks for catching this. Technically, we shouldn't have two ELUs after each other. Unfortunately, fixing this requires retraining all our checkpoints again with the updated network. We will try to fix this in our future major revision.
Thanks a lot for your reply.
Hi @arash-vahdat the decoder sampler cell is nn.ELU + Conv2D https://github.com/NVlabs/NVAE/blob/38eb9977aa6859c6ee037af370071f104c592695/model.py#L274 but the encoder sampler is only a Conv2D layer https://github.com/NVlabs/NVAE/blob/38eb9977aa6859c6ee037af370071f104c592695/model.py#L263 The outputs of encoder_tower and decoder_tower are not squashed by an activation function. Only encoder0 has an output activation function. https://github.com/NVlabs/NVAE/blob/38eb9977aa6859c6ee037af370071f104c592695/model.py#L253
I am curious about the difference between the encoder and decoder samplers. Is this a deliberated design?
encoder0 is an outlier (it is a kind of a legacy branch from some old idea that didn't work out. We will remove it in the future). The fact that the encoder sampler doesn't apply any activation function is by design. I think I did experiments at some point with adding an activation and normalization and I didn't see any improvements.
I see. Thanks a lot.
The last layer of ARInvertedREsidual is nn.ELU() https://github.com/NVlabs/NVAE/blob/38eb9977aa6859c6ee037af370071f104c592695/neural_ar_operations.py#L151 and the first function in ELUConv.forward is F.elu https://github.com/NVlabs/NVAE/blob/38eb9977aa6859c6ee037af370071f104c592695/neural_ar_operations.py#L136 This 2 modules are called in sequence in CellAR. Is this correct?