There seems to be a typo at line 318 of attention.py
It should be "self.proj_out = zero_module(nn.Linear(inner_dim, in_channels))" instead of "self.proj_out = zero_module(nn.Linear(in_channels, inner_dim))".
Fortunately, it makes no actual problems since "in_channels" is always equal to "inner_dim" in current implementation. But it is a coding error indeed.
Exactly, this line needs to be fixed as described above. It happens to work when model_channels can be divided by num_head_channels but fails for other cases, e.g., model_channels can be 320 or 384 but cannot be 416 when num_head_channels is 64.
There seems to be a typo at line 318 of attention.py It should be "self.proj_out = zero_module(nn.Linear(inner_dim, in_channels))" instead of "self.proj_out = zero_module(nn.Linear(in_channels, inner_dim))".
Fortunately, it makes no actual problems since "in_channels" is always equal to "inner_dim" in current implementation. But it is a coding error indeed.