Closed smallcatdog closed 10 months ago
We can not reproduce this issue, may be you can turn off the "fp16" precision setting in my deepspeed config. Here is the similar issue in deepspeed framework.
Are you using LLaMA2? Changing the deepspeed config does not work sinc they override the config in code (main_new.py). The weights of LLaMA2 seem to have problems with FP16 conversion, which is used in this scenario. I changed the code to use bfloat16 and everything worked fine.
You should find something similar to the following code in main_new.py except for the True
and False
beeing swapped.
config = LlamaConfig.from_pretrained(args.model_name_or_path)
tokenizer = LlamaTokenizer.from_pretrained(args.model_name_or_path)
model = LlamaForCausalLM.from_pretrained(args.model_name_or_path).half()
deepspeed_config["bfloat16"]["enabled"] = True
deepspeed_config["fp16"]["enabled"] = False
Hello, I am conducting experiments on "emotion impact prediction task and main task". The experimental setup is v100 32g,batch_size=1,history_window=12. May I ask how to solve the following problems? (1) OVERFLOW! Rank 0 Skipping step.
(2) Current loss scale already at minimum - cannot decrease scale anymore.