de-en model needs a lot more memory than de-cs

erickrf commented 1 year ago

I have been using the de-en and de-cs model on the same dataset (a few hundred thousand texts), and noticed that the English model needs a lot more memory than the Czech one. I'm running on an A100 GPU (40 GB memory).

In practice, I ended up with a batch size for English smaller than half of the Czech batch, even though the model config says they are roughly the same size - the only difference being that actually the de-cs vocabulary is slightly larger.

On top of that, the English model gets the repeating nonsense subsequence issue a lot more often. I approximated that by a type to token ratio below 0.15, which gives 20 texts to Czech and around 70k in English. I don't see how this might relate to memory consumption but maybe there's something.

jorgtied commented 1 year ago

What specific models were you using? Could it be that they have different parameter sizes, vocab sizes or something like that?

erickrf commented 1 year ago

I loaded them with the transformers library. For Czech it was Helsinki-NLP/opus-mt-de-cs and for English Helsinki-NLP/opus-mt-de-en.

Helsinki-NLP / Opus-MT

de-en model needs a lot more memory than de-cs #69