Can int8 in pre-training large model ???

bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

Other

3.21k stars 329 forks source link

Can int8 in pre-training large model ??? #521

Open zhoumengbo opened 1 year ago

zhoumengbo commented 1 year ago

Hello guys! I would like to know if you have experimented with int8 precision in the pre-training of your large models. Can int8 replace fp16 and fp32 to achieve faster training speeds? Are there any relevant case studies or experiments?