JonasGeiping / cramming

Cramming the training of a (BERT-type) language model into limited compute.
MIT License
1.3k stars 100 forks source link

Pretraining on a single RTX 3060 #26

Closed TahaBinhuraib closed 1 year ago

TahaBinhuraib commented 1 year ago

Hello, I've been using this repository on a cloud cluster of A100 gpus. Unfortunately, my credits have ended, and I'm planning to buy a PC to continue running experiments. The RTX 3060 has 12gb of vram, which is 1 gb more than the 2080 which was used in the paper. Do you think that it would be possible to pre-train a bert model with the RTX 3060? It would be great if you could advise me on this before going ahead and buying the PC. Thank you very much!

JonasGeiping commented 1 year ago

I mean, yeah, this is what this whole repository is about, right?

TahaBinhuraib commented 1 year ago

Yeah, I guess so. Thank you very much.