facebookresearch / metaseq

Repo for external large-scale work
MIT License
6.46k stars 724 forks source link

--arch transformer_lm and --tensor-init-on-gpu are incompatible #175

Open stephenroller opened 2 years ago

stephenroller commented 2 years ago

🐛 Bug

--arch transformer_lm and --tensor-init-on-gpu are incompatible (at least in fsdp)

Throws an exception about mixing fp32 and fp16.

KUNAL1612 commented 2 years ago

Is the expectation of this task to have a check that prevents setting both these flags? Or is it to explore the underlying issue that causes it?

stephenroller commented 2 years ago

I think we want to fix init on gpu for transformer_lm. We just missed a few cases