codertimo / BERT-pytorch

Google AI 2018 BERT pytorch implementation
Apache License 2.0
6.08k stars 1.29k forks source link

what’s your data set? #1

Open FFMRyan opened 5 years ago

codertimo commented 5 years ago

@iOSGeekerOfChina I didn't decide yet, just started this project one hour ago haha. Do you think using the dataset which referred on paper is good idea? Or have some another good idea? thanx 👍

crazyofapple commented 5 years ago

Maybe you can try some multilingual corpus not just English, hah

codertimo commented 5 years ago

@crazyofapple Totally agree haha. Now I'm trying to train this model with korean corpus with 1080ti x2. But seriously, the model is too big for individual researcher.... we need some NASA Scale GPU power.

MrRace commented 5 years ago

Just the same dataset with raw paper,I think maybe better

codertimo commented 5 years ago

@MrRace Love to do it, if I have enough lot's of 2080ti. https://twitter.com/Tim_Dettmers/status/1050787783004942336

Regarding compute for BERT: Uses 256 TPU-hours similar to the OpenAI model. Lots of TPUs parallelize about 25% better than GPUs. RTX 2080 Ti and V100 should be ~70% matmul and ~90% matmul perf vs TPU if you use 16-bit (important!). BERT ~= 375 RTX 2080 Ti days or 275 V100 days.

MrRace commented 5 years ago

@codertimo http://timdettmers.com/2018/10/17/tpus-vs-gpus-for-transformers-bert/

codertimo commented 5 years ago

@MrRace

On a standard, affordable GPU machine with 4 GPUs one can expect to train BERT for about 99 days using 16-bit or about 21 days using 8-bit.

Haha 99 days LoL.