a-antoniades / Neuroformer

MIT License
30 stars 3 forks source link

compute scale #5

Open zhixingheyi1102 opened 2 months ago

zhixingheyi1102 commented 2 months ago

Hello, I would like to know what computational scale is required to train this model?

a-antoniades commented 1 month ago

Hello. In practice you can use a single 40GB A100 card to most of the experiments in this paper, but we used 2-4 A100s to speed up training.

You can always use less parameters (layers/heads). It depends on what data you want to train on. The more data you have, the more beneficial it will be to have a larger model, following scaling laws.