delphi-suite / delphi

small language models training made easy
Apache License 2.0
9 stars 1 forks source link

fix problem with attention_size #144

Closed jettjaniak closed 4 months ago

jettjaniak commented 4 months ago

some sizes have intermediate_size = 2 * hidden_size, what's that about? (should be closer to 8/3)

jettjaniak commented 4 months ago

ValueError: hidden_size must be divisible by num_heads (got `hidden_size`: 54 and `num_heads`: 4)