Open GondorFu opened 6 months ago
异常报错截图 fp16 bf16都可以随时相互转换吧,应该不是数据类型的问题
异常报错截图 fp16 bf16都可以随时相互转换吧,应该不是数据类型的问题
没有报错,是推理结果不对,没merge结果是对的,但是merge完推理的结果都是[][][][][][]...
training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=partial(create_dataset_function, image_processor, text_processor), handle_metrics_function=handle_metrics_function, collate_fn=data_collator, forward_step_eval=forward_step_eval)
if args.use_lora:
model.get_mixin("lora").merge_lora()
model.get_mixin("eva").vit_model.get_mixin("lora").merge_lora()
args.use_lora = False
training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=partial(create_dataset_function, image_processor, text_processor), handle_metrics_function=handle_metrics_function, collate_fn=data_collator, forward_step_eval=forward_step_eval)
两个都能正常输出结果,但是上面的结果是正确的,但是下面的结果就是错的?请问一下是什么原因
Abnormal error screenshot fp16 bf16 can be converted to each other at any time, it should not be a problem of data type
Afaik, It can be a problem, due to bf16 having a higher range but lower precision.
Was this ever solved? Also running into this error when trying to just reproduce the CogAgent finetuning results from the official example scripts.
During fine-tuning (finetune_cogagent_demo.py), the predictions are correct, but the merged model has wrong predictions that are completely off during evaluation (merge_model.py and evaluate_cogagent_demo.py).
System Info / 系統信息
版本及硬件按照指示安装
Who can help? / 谁可以帮助到您?
@1049451037
Information / 问题信息
Reproduction / 复现过程
Expected behavior / 期待表现
怀疑是 fp16 训练的模型,merge 过程中存在bug,能不能帮忙定位一下问题