HelloWorld19930113 commented 1 month ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

minCPMV-2.5上全参微调模型

期望行为 | Expected Behavior

诸位大佬又遇到过吗？

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

无

Zmeo commented 1 month ago

模型名是不是拼的不对啊

HelloWorld19930113 commented 1 month ago

哪个地方设置的名字？我是直接改成本地存放模型的路径

qyc-98 commented 1 month ago

可以放一下您的运行脚本我们看一下

HelloWorld19930113 commented 1 month ago

`#!/bin/bash

GPUS_PER_NODE=4 NNODES=1 NODE_RANK=0 MASTER_ADDR=localhost MASTER_PORT=6001

MODEL="../../opensoure/minicpm2.5" # or openbmb/MiniCPM-V-2

ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.

See the section for finetuning in README for more information.

DATA="./trainging_data.json" EVAL_DATA="./test_data.json" LLM_TYPE="llama3" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm

DISTRIBUTED_ARGS=" --nproc_per_node $GPUS_PER_NODE \ --nnodes $NNODES \ --node_rank $NODE_RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT " torchrun $DISTRIBUTED_ARGS finetune.py \ --model_name_or_path $MODEL \ --llm_type $LLM_TYPE \ --data_path $DATA \ --eval_data_path $EVAL_DATA \ --remove_unused_columns false \ --label_names "labels" \ --prediction_loss_only false \ --bf16 true \ --bf16_full_eval true \ --fp16 false \ --fp16_full_eval false \ --do_train \ --do_eval \ --tune_vision true \ --tune_llm true \ --model_max_length 2048 \ --max_steps 10000 \ --eval_steps 1000 \ --output_dir output/output_minicpmv2 \ --logging_dir output/output_minicpmv2 \ --logging_strategy "steps" \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "steps" \ --save_strategy "steps" \ --save_steps 1000 \ --save_total_limit 10 \ --learning_rate 1e-6 \ --weight_decay 0.1 \ --adam_beta2 0.95 \ --warmup_ratio 0.01 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --gradient_checkpointing true \ --deepspeed ds_config_zero2.json \ --report_to "tensorboard" ```

Cuiunbo commented 1 month ago

您好，应该是frompretrain不支持小数点，把它删除了可能就好了

OpenBMB / MiniCPM-V

微调训练时报了ModuleNotFoundError:No module named 'transformers_modules.minicpm-2？不知道是哪个地方出问题了 #182

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.

See the section for finetuning in README for more information.