Closed jettjaniak closed 4 months ago
some sizes have intermediate_size = 2 * hidden_size, what's that about? (should be closer to 8/3)
ValueError: hidden_size must be divisible by num_heads (got `hidden_size`: 54 and `num_heads`: 4)
some sizes have intermediate_size = 2 * hidden_size, what's that about? (should be closer to 8/3)