1. scale in Attention is set to 8 = sqrt(dim_head), not 1/sqrt(dim_head) as normal used, is this a special design or a bug?
2. NLayerDiscriminator3D use the output of leaky_relu(or followed by sigmoid) as logits (which not consistent with NLayerDiscriminator2D), is this OK? besides, self.n_layers+2 in forward not true when use_sigmoid.
hello, found two problems as below:
thanks.