Open erickrf opened 1 year ago
What specific models were you using? Could it be that they have different parameter sizes, vocab sizes or something like that?
I loaded them with the transformers
library. For Czech it was Helsinki-NLP/opus-mt-de-cs
and for English Helsinki-NLP/opus-mt-de-en
.
I have been using the
de-en
andde-cs
model on the same dataset (a few hundred thousand texts), and noticed that the English model needs a lot more memory than the Czech one. I'm running on an A100 GPU (40 GB memory).In practice, I ended up with a batch size for English smaller than half of the Czech batch, even though the model config says they are roughly the same size - the only difference being that actually the
de-cs
vocabulary is slightly larger.On top of that, the English model gets the repeating nonsense subsequence issue a lot more often. I approximated that by a type to token ratio below 0.15, which gives 20 texts to Czech and around 70k in English. I don't see how this might relate to memory consumption but maybe there's something.