Closed moon-fall closed 3 months ago
toy_finetune_data.jsonl
is just toy data, only used to show the training data format. You need to replace it with your data.
toy_finetune_data.jsonl
is just toy data, only used to show the training data format. You need to replace it with your data.
same for using my own data, I think there is something wrong.
直接使用官方样例数据 https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/llm_reranker/toy_finetune_data.jsonl 对 bge-reranker-v2-m3 进行微调 命令为
torchrun --nproc_per_node 1 -m FlagEmbedding.llm_reranker.finetune_for_instruction.run --output_dir /home/lf/models/bge-reranker-v2-m3-finetune --model_name_or_path /home/lf/models/bge-reranker-v2-m3 --train_data /home/lf/data/toy_finetune_data.jsonl --learning_rate 2e-4 --num_train_epochs 1 --per_device_train_batch_size 1 --gradient_accumulation_steps 16 --dataloader_drop_last True --query_max_len 512 --passage_max_len 512 --train_group_size 16 --logging_steps 1 --save_steps 2000 --save_total_limit 50 --ddp_find_unused_parameters False --gradient_checkpointing --deepspeed stage1.json --warmup_ratio 0.1 --bf16 --use_lora False --use_flash_attn False
日志:
06/04/2024 14:38:03 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. 06/04/2024 14:38:04 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 0 06/04/2024 14:38:04 - INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes. [2024-06-04 14:38:05,548] [WARNING] [lr_schedules.py:759:init] total_num_steps 1 is less than warmup_num_steps 1 0%| | 0/1 [00:00<?, ?it/s]/home/narwal/.local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2717: UserWarning:
max_length
is ignored whenpadding
=True
and there is no truncation strategy. To pad to max length, usepadding='max_length'
. warnings.warn( /home/narwal/.local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2717: UserWarning:max_length
is ignored whenpadding
=True
and there is no truncation strategy. To pad to max length, usepadding='max_length'
. warnings.warn( 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.43s/it]tried to get lr value before scheduler/optimizer started stepping, returning lr=0 {'loss': 2.3477, 'learning_rate': 0, 'epoch': 1.0} {'train_runtime': 1.4829, 'train_samples_per_second': 6.744, 'train_steps_per_second': 0.674, 'train_loss': 2.34765625, 'epoch': 1.0} 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.48s/it]对toy_finetune_data.jsonl计算排序准确率 微调前排序准确率为 100% 微调后排序准确率仅为10%
微调前样例分数 [['A man pulls two women down a city street in a rickshaw.', 'A man is in a city.'], ['A man pulls two women down a city street in a rickshaw.', 'A man is a pilot of an airplane.'], ['A man pulls two women down a city street in a rickshaw.', 'It is boring and mundane.'], ['A man pulls two women down a city street in a rickshaw.', 'The morning sunlight was shining brightly and it was warm. '], ['A man pulls two women down a city street in a rickshaw.', 'Two people jumped off the dock.'], ['A man pulls two women down a city street in a rickshaw.', 'People watching a spaceship launch.'], ['A man pulls two women down a city street in a rickshaw.', 'Mother Teresa is an easy choice.'], ['A man pulls two women down a city street in a rickshaw.', "It's worth being able to go at a pace you prefer."]] tensor([ 5.1926, -10.7530, -11.0055, -9.5263, -10.3903, -11.0282, -11.0232, -9.5248], device='cuda:0', grad_fn=)
微调后样例分数都变为差别不大的小数 [['A man pulls two women down a city street in a rickshaw.', 'A man is in a city.'], ['A man pulls two women down a city street in a rickshaw.', 'A man is a pilot of an airplane.'], ['A man pulls two women down a city street in a rickshaw.', 'It is boring and mundane.'], ['A man pulls two women down a city street in a rickshaw.', 'The morning sunlight was shining brightly and it was warm. '], ['A man pulls two women down a city street in a rickshaw.', 'Two people jumped off the dock.'], ['A man pulls two women down a city street in a rickshaw.', 'People watching a spaceship launch.'], ['A man pulls two women down a city street in a rickshaw.', 'Mother Teresa is an easy choice.'], ['A man pulls two women down a city street in a rickshaw.', "It's worth being able to go at a pace you prefer."]] tensor([0.3033, 0.3032, 0.3118, 0.2768, 0.2530, 0.3259, 0.2976, 0.2699], device='cuda:0', grad_fn=)