Precision Problem between nemo model and hugging face model

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

Apache License 2.0

12.07k stars 2.51k forks source link

Precision Problem between nemo model and hugging face model #9137

Closed ChencongZJU closed 5 months ago

ChencongZJU commented 6 months ago

Describe the bug

We are using nemo to training our large vision language model. When converting models from nemo format to hugging face format, we found that given the same inputs and weights, we get different outputs.

We test that after layer normalization, even using the same weights and input, outputs are different. Nemo use transformers engine and below code to calculate:

飞书20240508-190258

And hugging face using pytorch

And I also found there also exists little precision gap in rotational positional embedding、attention and FFN.

Expected behavior

Is the precision gap caussed by different calculation operator? How can I fix that?

Thank you!

yaoyu-33 commented 6 months ago

Hi, we are aware that some TE implementations won't generate identical results to those of HF (which uses native PyTorch). We use our fused version of operations to speed up training. It seems you are using the Llama model as a foundation model. NeMo thoroughly tests Llama models to ensure that even though the results are not bit-wise matching, the overall performance (benchmarks) is on par.

If you have more concerns about the behavior, please provide us with more details. What model are you converting, what command are you using, and how large is the gap? We can check whether the gap is reasonable.

ChencongZJU commented 5 months ago

Hi, we are aware that some TE implementations won't generate identical results to those of HF (which uses native PyTorch). We use our fused version of operations to speed up training. It seems you are using the Llama model as a foundation model. NeMo thoroughly tests Llama models to ensure that even though the results are not bit-wise matching, the overall performance (benchmarks) is on par.

If you have more concerns about the behavior, please provide us with more details. What model are you converting, what command are you using, and how large is the gap? We can check whether the gap is reasonable.

Thanks for your patient reply. We also test that the precission gab doesn't affect perfoemance.