ficstamas / charmen-electra

3 stars 1 forks source link

Pretraining hyperparameters #1

Open stefan-it opened 1 year ago

stefan-it commented 1 year ago

Hi @ficstamas ,

many thanks for open sourcing this very interesting implementation!

I would like to train own models with this implementation (as additional models to my ByT5 project on historic texts), so I wanted to know if you could give feedback about the hyperparameters that were used for pretraining this Hungarian model :thinking:

I would also be interested in number of GPUs used for pretraining and pretraining time for this model.

Many thanks in advance!

ficstamas commented 1 year ago

Hey,

Here's a rough list, let's hope I don't forget any important point:

We have a publication about it but sadly it is in hungarian. If you need to know anything else feel free to ask.

ficstamas commented 1 year ago

Probably we trained them on 2 x NVIDIA RTX 2080 Ti, I'm not entirely sure but I'm going to check it

Yep, we used 2 x NVIDIA RTX 2080 Ti at that time