Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development
https://llama2-accessory.readthedocs.io/
Other
2.68k stars 170 forks source link

significant difference between median and global averaged loss #77

Closed ZhenYangIACAS closed 8 months ago

ZhenYangIACAS commented 11 months ago

Hi, I get very strange training loss: Averaged stats: lr: 0.000015 closs: 2.2352 (9.5134) grad_norm: 1.9342 (1.2562) Averaged stats: lr: 0.000030 closs: 0.2981 (9.2237) grad_norm: 0.5920 (1.0241)

I want to know why there is so much difference between the median loss and global averaged loss?

ChrisLiu6 commented 11 months ago

Well, one possibility is that your training loss has indeed undergone some steep change, so the high values in the past largely influence the global average. Another possibility is that your training data is composed of items covering a large range of difficulty. So a small proportion of items may have very large loss values that would influence the average loss, but due to the small proportion, they may not be reflected by the median value. There could also be other reasons.