Closed hhhhzzzzz closed 3 weeks ago
Q: What does <image>
token represents the position of the image within the input for vision models. While it can theoretically be placed at any position, it is typically located at either the beginning or the end. During data processing, the <image>
token is uniformly placed at the start, and the original <image>
token is removed (see code_for _removing_image_token and code_for _adding_image_token). Thus, it is not necessary to manually add this <image>
token.
Q: The first answer in the value is the chosen sample? And the second answer in the value is the rejected sample? A: Yes, the first answer is preferred over the second.
Q: Could we change "from": "gpt" to "from": "human" if the answer is generated by human? A: You may proceed with this, as the value is not utilized in data processing.
Please let us know if you have any other questions.
Hi, The answer to the third question is the same as the second.
Besides, if I want to use liuhaotian/llava-v1.6-mistral-7b for training on my own data. How can I set the run_rm_training.sh?
VISION_MODEL=base_models/vision_encoder/clip-vit-large-patch14 LLM=base_models/llama-3-8b-Instruct
FROM_CHECKPOINT=models/sft_test/epoch-3
TEMPLATE=llama_3
IMAGE_FOLDER=data/coco_2017/
EPOCH=3 ZERO_STAGE=2
lr=3e-5
DATA_PATH=data/reward_samples.json EVAL_DATA_PATH=data/reward_samples_test.json CANDIDATE_NUM=2
IMAGE_FOLDER=data/coco_2017/ DATA="llava_reward"
DATA_SAMPLE="all" IMAGE_PER_SAMPLE="1"
OUTPUT=models/reward_test
Thanks!
The prior response has been updated; please review it. Below is a training script I developed according to your specifications. However, the liuhaotian/llava-v1.6-mistral-7b
has not been extensively tested. Please inform us if you encounter any issues.
#!/bin/bash
CUR_DIR=`pwd`
ROOT=${CUR_DIR}
export PYTHONPATH=${ROOT}:${PYTHONPATH}
# you can replace this path with your path to ``liuhaotian/llava-v1.6-mistral-7b``
FROM_CHECKPOINT=llava-1.5-7b-hf
MODEL_ARCHITECTURE="llava_next"
# !!! Note: We haven't tested llava-v1.6-mistral-7b yet, when training a reward model
# so you may need to check this template to see if any code fixes are necessary.
TEMPLATE=llava
EPOCH=1
ZERO_STAGE=3
lr=1e-6
# if you do not train a reward based on a pre-trained reward model,
# this parameter does not need to be set
TRAINED_REWARD_MODEL=none
OUTPUT=your_output_path
DATA_PATH=your_data_path_for_training_reward_model
EVAL_DATA_PATH=your_data_path_for_test_reward_model
IMAGE_FOLDER=your_image_folder
CANDIDATE_NUM=2
DATA="llava_reward"
DATA_SAMPLE="all"
IMAGE_PER_SAMPLE="1"
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=0
fi
mkdir -p $OUTPUT
cp $0 $OUTPUT
# we assume the batch size is 128, which means Num_GPU * per_device_train_batch_size * gradient_accumulation_steps
nohup deepspeed --include localhost:0,1,2,3,4,5,6,7 --master_port 12335 training/reward_model_training/rm_training_main.py \
--max_seq_len 2048 --image_folder ${IMAGE_FOLDER} --template ${TEMPLATE} \
--data_path ${DATA_PATH} --eval_data_path ${EVAL_DATA_PATH} \
--dataset_names ${DATA} --dataset_samples ${DATA_SAMPLE} --dataset_concatenate_samples ${IMAGE_PER_SAMPLE} --max_num_image_per_sample 8 \
--lm_reward_model_name_or_path ${LLM} \
--vision_reward_model_name_or_path ${VISION_MODEL} \
--gradient_checkpointing --vis_proj baseline \
--gradient_accumulation_steps 1 --zero_stage $ZERO_STAGE --learning_rate $lr --num_warmup_steps 0.1 \
--per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_step 200 \
--deepspeed --output_dir $OUTPUT --num_train_epochs ${EPOCH} \
--lang_decoder_update --enable_mmca_attention --model_architecture ${MODEL_ARCHITECTURE} \
--trained_reward_model $TRAINED_REWARD_MODEL --save_step 9900 \
--precision bf16 --ranked_candidate_num $CANDIDATE_NUM --from_checkpoint ${FROM_CHECKPOINT}
OK. Thanks a lot! I will try this script to develop reward models with liuhaotian/llava-v1.6-mistral-7b. I'll update the results!
We'll be testing this model in the last few days as well. You can also keep checking back for updates.
I'm very sorry that I can't release the model since it will be trained on private data.
OK. Please let us know if you have any questions during training with the model.
OK. Thanks a lot!
Hi,
I find some bugs in your codes.
I changed it to
from .third_party_model.hf_model.modeling_llava import LlavaForConditionalGeneration
from .third_party_model.hf_model.configuration_llava import LlavaConfig
#from .third_party_model.hf_model.modeling_llava_next import LlavaNextForConditionalGeneration
from transformers import LlavaNextForConditionalGeneration, LlavaNextConfig
#from .third_party_model.hf_model.configuration_llava_next import LlavaNextConfig
I didn't find the parameters for args.reward_model_architecture and args.reward_base_model.
I changed it to
if args.model_architecture=="default" and is_reward:
vis_llm, reward_image_processor, reward_tokenizer = create_dsvl_model_and_transforms(text_tokenizer=text_tokenizer,
ds_config=ds_config,
args=args)
elif is_reward:
vis_llm, reward_image_processor, reward_tokenizer = build_model(text_tokenizer=text_tokenizer,
ds_config=ds_config,
model_architecture=args.model_architecture,
from_checkpoint=args.from_checkpoint,
args=args)
else:
vis_llm, reward_image_processor, reward_tokenizer = build_model(text_tokenizer=text_tokenizer,
ds_config=ds_config,
args=args)
# load paramters from `from_checkpoint`
if training_reward_stage and args.model_architecture=='default':
# we have the deepspeed chekpoint so it is a resumed job
print(f"load checkpoint from {args.from_checkpoint}")
vis_llm.load_state_dict(torch.load(os.path.join(args.from_checkpoint, 'pytorch_model.bin'), map_location='cpu'), strict=False)
if is_reward and (args.model_architecture=="llava" or args.model_architecture=="llava_next"):
vis_reward_model = ViRewardModel(vis_llm=vis_llm,
tokenizer=reward_tokenizer,
is_reward=is_reward,
vis_architecture="llava")
else:
vis_reward_model = ViRewardModel(vis_llm=vis_llm,
tokenizer=reward_tokenizer,
is_reward=is_reward,
vis_architecture=args.model_architecture)
return vis_reward_model, reward_image_processor, reward_tokenizer
Do you think my modifications are reasonable?
Then I can run the command:
#!/bin/bash
CUR_DIR=`pwd`
ROOT=${CUR_DIR}
export PYTHONPATH=${ROOT}:${PYTHONPATH}
VISION_MODEL=base_models/vision_encoder/clip-vit-large-patch14
LLM=base_models/mistral-7b-instruct-v0.2
# you can replace this path with your path to ``liuhaotian/llava-v1.6-mistral-7b``
FROM_CHECKPOINT=llava-hf/llava-v1.6-mistral-7b-hf
MODEL_ARCHITECTURE="llava_next"
# !!! Note: We haven't tested llava-v1.6-mistral-7b yet, when training a reward model
# so you may need to check this template to see if any code fixes are necessary.
TEMPLATE=llava
EPOCH=1
ZERO_STAGE=3
lr=1e-6
# if you do not train a reward based on a pre-trained reward model,
# this parameter does not need to be set
TRAINED_REWARD_MODEL=none
OUTPUT=output
DATA_PATH=./data/rm_dataset.json
# EVAL_DATA_PATH=none
IMAGE_FOLDER=image_dataset
CANDIDATE_NUM=2
DATA="llava_reward"
DATA_SAMPLE="all"
IMAGE_PER_SAMPLE="1"
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=0
fi
mkdir -p $OUTPUT
cp $0 $OUTPUT
# we assume the batch size is 128, which means Num_GPU * per_device_train_batch_size * gradient_accumulation_steps
deepspeed --include localhost:0,1,2,3,4,5,6,7 --master_port 12335 training/reward_model_training/rm_training_main.py \
--max_seq_len 2048 --image_folder ${IMAGE_FOLDER} --template ${TEMPLATE} \
--data_path ${DATA_PATH} \
--dataset_names ${DATA} --dataset_samples ${DATA_SAMPLE} --dataset_concatenate_samples ${IMAGE_PER_SAMPLE} --max_num_image_per_sample 8 \
--lm_reward_model_name_or_path ${LLM} \
--vision_reward_model_name_or_path ${VISION_MODEL} \
--gradient_checkpointing --vis_proj baseline \
--gradient_accumulation_steps 2 --zero_stage $ZERO_STAGE --learning_rate $lr --num_warmup_steps 0.1 \
--per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_step 200 \
--deepspeed --output_dir $OUTPUT --num_train_epochs ${EPOCH} \
--lang_decoder_update --enable_mmca_attention --model_architecture ${MODEL_ARCHITECTURE} \
--trained_reward_model $TRAINED_REWARD_MODEL --save_step 9900 \
--precision bf16 --ranked_candidate_num $CANDIDATE_NUM --from_checkpoint ${FROM_CHECKPOINT}
I don't know how VISION_MODEL=base_models/vision_encoder/clip-vit-large-patch14 LLM=base_models/mistral-7b-instruct-v0.2 work. Are are they useless parameters?
Thanks!
We have completed the test on the liuhaotian/llava-v1.6-mistral-7b model.
You need to update code (run git pull
).
This is the script we used:
#!/bin/bash
CUR_DIR=`pwd`
ROOT=${CUR_DIR}
export PYTHONPATH=${ROOT}:${PYTHONPATH}
LLM=none
VISION_MODEL=none
# you can replace this path with your path to ``liuhaotian/llava-v1.6-mistral-7b``
FROM_CHECKPOINT=base_models/llava-v1.6-mistral-7b-hf
MODEL_ARCHITECTURE="llava_next"
# !!! Note: We haven't tested llava-v1.6-mistral-7b yet, when training a reward model
# so you may need to check this template to see if any code fixes are necessary.
TEMPLATE=llava_next
EPOCH=1
ZERO_STAGE=3
lr=1e-6
# if you do train a reward based on a pre-trained reward model,
# this parameter does not need to be set
TRAINED_REWARD_MODEL=none
OUTPUT=models/test
DATA_PATH=data/RLAIF-V-Dataset/rlaif_v_dataset_test.json
EVAL_DATA_PATH=data/RLAIF-V-Dataset/rlaif_v_dataset_test.json
IMAGE_FOLDER=data/RLAIF-V-Dataset/images
CANDIDATE_NUM=2
DATA="llava_reward"
DATA_SAMPLE="all"
IMAGE_PER_SAMPLE="1"
if [ "$ZERO_STAGE" == "" ]; then
ZERO_STAGE=0
fi
mkdir -p $OUTPUT
cp $0 $OUTPUT
# we assume the batch size is 128, which means Num_GPU * per_device_train_batch_size * gradient_accumulation_steps
nohup deepspeed --include localhost:2,3,4,5,6,7 --master_port 12335 training/reward_model_training/rm_training_main.py \
--max_seq_len 2048 --image_folder ${IMAGE_FOLDER} --template ${TEMPLATE} \
--data_path ${DATA_PATH} --eval_data_path ${EVAL_DATA_PATH} \
--dataset_names ${DATA} --dataset_samples ${DATA_SAMPLE} --dataset_concatenate_samples ${IMAGE_PER_SAMPLE} --max_num_image_per_sample 8 \
--lm_reward_model_name_or_path ${LLM} \
--vision_reward_model_name_or_path ${VISION_MODEL} \
--gradient_checkpointing --vis_proj baseline \
--gradient_accumulation_steps 1 --zero_stage $ZERO_STAGE --learning_rate $lr --num_warmup_steps 0.1 \
--per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_step 200 \
--deepspeed --output_dir $OUTPUT --num_train_epochs ${EPOCH} \
--lang_decoder_update --enable_mmca_attention --model_architecture ${MODEL_ARCHITECTURE} \
--trained_reward_model $TRAINED_REWARD_MODEL --save_step 9900 \
--precision bf16 --ranked_candidate_num $CANDIDATE_NUM --from_checkpoint ${FROM_CHECKPOINT} > $OUTPUT/training.log &
log:
[2024-08-30 15:25:38,682] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:39,949] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-08-30 15:25:39,949] [INFO] [runner.py:568:main] cmd = /localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=12335 --enable_each_rank_log=None training/reward_model_training/rm_training_main.py --max_seq_len 2048 --image_folder data/RLAIF-V-Dataset/images --template llava_next --data_path data/RLAIF-V-Dataset/rlaif_v_dataset_test.json --eval_data_path data/RLAIF-V-Dataset/rlaif_v_dataset_test.json --dataset_names llava_reward --dataset_samples all --dataset_concatenate_samples 1 --max_num_image_per_sample 8 --lm_reward_model_name_or_path none --vision_reward_model_name_or_path none --gradient_checkpointing --vis_proj baseline --gradient_accumulation_steps 1 --zero_stage 3 --learning_rate 1e-6 --num_warmup_steps 0.1 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_step 200 --deepspeed --output_dir models/test --num_train_epochs 1 --lang_decoder_update --enable_mmca_attention --model_architecture llava_next --trained_reward_model none --save_step 9900 --precision bf16 --ranked_candidate_num 2 --from_checkpoint /localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf
[2024-08-30 15:25:42,279] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:42,789] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [2, 3, 4, 5, 6, 7]}
[2024-08-30 15:25:42,789] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=6, node_rank=0
[2024-08-30 15:25:42,790] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5]})
[2024-08-30 15:25:42,790] [INFO] [launch.py:163:main] dist_world_size=6
[2024-08-30 15:25:42,790] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=2,3,4,5,6,7
[2024-08-30 15:25:42,790] [INFO] [launch.py:253:main] process 1874730 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=0', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:42,791] [INFO] [launch.py:253:main] process 1874731 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=1', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:42,792] [INFO] [launch.py:253:main] process 1874732 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=2', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:42,792] [INFO] [launch.py:253:main] process 1874733 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=3', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:42,793] [INFO] [launch.py:253:main] process 1874734 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=4', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:42,793] [INFO] [launch.py:253:main] process 1874735 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=5', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:45,490] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:45,492] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:45,506] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:45,531] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:45,597] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:45,624] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-08-30 15:25:46,077] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-30 15:25:46,077] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-08-30 15:25:46,265] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-30 15:25:46,291] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-30 15:25:46,297] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-30 15:25:46,307] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-30 15:25:46,328] [INFO] [comm.py:637:init_distributed] cdb=None
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:06<00:19, 6.59s/it]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:06<00:20, 6.77s/it]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:07<00:21, 7.12s/it]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:07<00:21, 7.21s/it]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:07<00:21, 7.24s/it]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:07<00:22, 7.47s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.77s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.90s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:10<00:09, 4.83s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:10<00:09, 4.84s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:10<00:09, 4.86s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:10<00:09, 4.93s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:10<00:03, 3.20s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:11<00:03, 3.23s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00, 2.03s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00, 2.80s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00, 2.04s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00, 2.81s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
ViRewardModel(
(v_head): Linear(in_features=4096, out_features=1, bias=False)
(rwtranrsformer): LlavaNextForConditionalGeneration(
(vision_tower): CLIPVisionModel(
(vision_model): CLIPVisionTransformer(
(embeddings): CLIPVisionEmbeddings(
(patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)
(position_embedding): Embedding(577, 1024)
)
(pre_layrnorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder): CLIPEncoder(
(layers): ModuleList(
(0-23): 24 x CLIPEncoderLayer(
(self_attn): CLIPAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): CLIPMLP(
(activation_fn): QuickGELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
)
(layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
)
(post_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
(multi_modal_projector): LlavaNextMultiModalProjector(
(linear_1): Linear(in_features=1024, out_features=4096, bias=True)
(act): GELUActivation()
(linear_2): Linear(in_features=4096, out_features=4096, bias=True)
)
(language_model): MistralForCausalLM(
(model): MistralModel(
(embed_tokens): Embedding(32064, 4096)
(layers): ModuleList(
(0-31): 32 x MistralDecoderLayer(
(self_attn): MistralSdpaAttention(
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(k_proj): Linear(in_features=4096, out_features=1024, bias=False)
(v_proj): Linear(in_features=4096, out_features=1024, bias=False)
(o_proj): Linear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): MistralRotaryEmbedding()
)
(mlp): MistralMLP(
(gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
(up_proj): Linear(in_features=4096, out_features=14336, bias=False)
(down_proj): Linear(in_features=14336, out_features=4096, bias=False)
(act_fn): SiLU()
)
(input_layernorm): MistralRMSNorm()
(post_attention_layernorm): MistralRMSNorm()
)
)
(norm): MistralRMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=32064, bias=False)
)
)
)
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
[DATA] Built dataset llava_reward with all 1000 samples.
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
[DATA] Built dataset llava_reward with all 1000 samples.
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
[2024-08-30 15:26:00,304] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.0, git-hash=unknown, git-branch=unknown
[2024-08-30 15:26:00,304] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:13<00:03, 3.91s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:13<00:03, 3.91s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:13<00:03, 3.94s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 2.44s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 3.35s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 2.46s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 3.35s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:13<00:03, 3.96s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 2.45s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 3.36s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 2.46s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 3.40s/it]
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
[2024-08-30 15:26:35,358] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-08-30 15:26:35,360] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-08-30 15:26:35,360] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-08-30 15:26:35,377] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2024-08-30 15:26:35,377] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'transformers.optimization.AdamW'>
[2024-08-30 15:26:35,377] [WARNING] [engine.py:1188:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2024-08-30 15:26:35,377] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False
[2024-08-30 15:26:35,377] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer
[2024-08-30 15:26:35,477] [INFO] [utils.py:800:see_memory_usage] Stage 3 initialize beginning
[2024-08-30 15:26:35,478] [INFO] [utils.py:801:see_memory_usage] MA 14.09 GB Max_MA 14.09 GB CA 14.24 GB Max_CA 14 GB
[2024-08-30 15:26:35,478] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 79.12 GB, percent = 7.9%
[2024-08-30 15:26:35,480] [INFO] [stage3.py:130:__init__] Reduce bucket size 500,000,000
[2024-08-30 15:26:35,480] [INFO] [stage3.py:131:__init__] Prefetch bucket size 0
[2024-08-30 15:26:35,550] [INFO] [utils.py:800:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2024-08-30 15:26:35,551] [INFO] [utils.py:801:see_memory_usage] MA 14.09 GB Max_MA 14.09 GB CA 14.24 GB Max_CA 14 GB
[2024-08-30 15:26:35,551] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 79.13 GB, percent = 7.9%
Parameter Offload: Total persistent parameters: 607232 in 314 params
[2024-08-30 15:26:36,311] [INFO] [utils.py:800:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2024-08-30 15:26:36,312] [INFO] [utils.py:801:see_memory_usage] MA 2.35 GB Max_MA 14.09 GB CA 16.66 GB Max_CA 17 GB
[2024-08-30 15:26:36,312] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 79.35 GB, percent = 7.9%
[2024-08-30 15:26:37,650] [INFO] [utils.py:800:see_memory_usage] Before creating fp16 partitions
[2024-08-30 15:26:37,651] [INFO] [utils.py:801:see_memory_usage] MA 2.35 GB Max_MA 2.35 GB CA 16.66 GB Max_CA 17 GB
[2024-08-30 15:26:37,651] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 79.59 GB, percent = 7.9%
[2024-08-30 15:26:40,150] [INFO] [utils.py:800:see_memory_usage] After creating fp16 partitions: 4
[2024-08-30 15:26:40,151] [INFO] [utils.py:801:see_memory_usage] MA 2.35 GB Max_MA 2.35 GB CA 3.26 GB Max_CA 17 GB
[2024-08-30 15:26:40,151] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 83.13 GB, percent = 8.3%
[2024-08-30 15:26:40,228] [INFO] [utils.py:800:see_memory_usage] Before creating fp32 partitions
[2024-08-30 15:26:40,228] [INFO] [utils.py:801:see_memory_usage] MA 2.35 GB Max_MA 2.35 GB CA 3.26 GB Max_CA 3 GB
[2024-08-30 15:26:40,228] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 81.76 GB, percent = 8.1%
[2024-08-30 15:26:40,329] [INFO] [utils.py:800:see_memory_usage] After creating fp32 partitions
[2024-08-30 15:26:40,329] [INFO] [utils.py:801:see_memory_usage] MA 6.86 GB Max_MA 7.95 GB CA 8.86 GB Max_CA 9 GB
[2024-08-30 15:26:40,329] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 79.8 GB, percent = 7.9%
[2024-08-30 15:26:40,439] [INFO] [utils.py:800:see_memory_usage] Before initializing optimizer states
[2024-08-30 15:26:40,440] [INFO] [utils.py:801:see_memory_usage] MA 6.86 GB Max_MA 6.86 GB CA 8.86 GB Max_CA 9 GB
[2024-08-30 15:26:40,440] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 79.57 GB, percent = 7.9%
[2024-08-30 15:26:40,520] [INFO] [utils.py:800:see_memory_usage] After initializing optimizer states
[2024-08-30 15:26:40,520] [INFO] [utils.py:801:see_memory_usage] MA 6.86 GB Max_MA 10.59 GB CA 12.59 GB Max_CA 13 GB
[2024-08-30 15:26:40,521] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 79.57 GB, percent = 7.9%
[2024-08-30 15:26:40,521] [INFO] [stage3.py:486:_setup_for_real_optimizer] optimizer state initialized
0%| | 0/21 [00:00<?, ?it/s]
0%| | 0/21 [00:00<?, ?it/s]
0%| | 0/21 [00:00<?, ?it/s]
0%| | 0/21 [00:00<?, ?it/s]
0%| | 0/21 [00:00<?, ?it/s][2024-08-30 15:26:42,158] [INFO] [utils.py:800:see_memory_usage] After initializing ZeRO optimizer
[2024-08-30 15:26:42,158] [INFO] [utils.py:801:see_memory_usage] MA 10.05 GB Max_MA 10.54 GB CA 15.21 GB Max_CA 15 GB
[2024-08-30 15:26:42,159] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 79.62 GB, percent = 7.9%
[2024-08-30 15:26:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
[2024-08-30 15:26:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-08-30 15:26:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x1553fca66c80>
[2024-08-30 15:26:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2024-08-30 15:26:42,160] [INFO] [config.py:996:print] DeepSpeedEngine configuration:
[2024-08-30 15:26:42,160] [INFO] [config.py:1000:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2024-08-30 15:26:42,160] [INFO] [config.py:1000:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-08-30 15:26:42,160] [INFO] [config.py:1000:print] amp_enabled .................. False
[2024-08-30 15:26:42,160] [INFO] [config.py:1000:print] amp_params ................... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] bfloat16_enabled ............. True
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x1553fca67280>
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] communication_data_type ...... None
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={}
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] dataloader_drop_last ......... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] disable_allgather ............ False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] dump_state ................... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... None
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] elasticity_enabled ........... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] fp16_auto_cast ............... None
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] fp16_enabled ................. False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] global_rank .................. 0
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] grad_accum_dtype ............. None
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 1
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] gradient_clipping ............ 1.0
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] graph_harvesting ............. False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 1
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] load_universal_checkpoint .... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] loss_scale ................... 1.0
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] memory_breakdown ............. False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] mics_shard_size .............. -1
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] optimizer_name ............... None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] optimizer_params ............. None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] pld_enabled .................. False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] pld_params ................... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] prescale_gradients ........... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] scheduler_name ............... None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] scheduler_params ............. None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] sparse_attention ............. None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] steps_per_print .............. 10
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] train_batch_size ............. 48
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 8
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] use_data_before_expert_parallel_ False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] use_node_local_storage ....... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] wall_clock_breakdown ......... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] weight_quantization_config ... None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] world_size ................... 6
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] zero_allow_untested_optimizer True
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=0 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False pipeline_loading_checkpoint=False override_module_apply=True
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] zero_enabled ................. True
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print] zero_optimization_stage ...... 3
[2024-08-30 15:26:42,162] [INFO] [config.py:986:print_user_config] json = {
"train_batch_size": 48,
"train_micro_batch_size_per_gpu": 8,
"steps_per_print": 10,
"zero_optimization": {
"stage": 3,
"offload_param": {
"device": "none"
},
"offload_optimizer": {
"device": "none"
},
"stage3_param_persistence_threshold": 1.000000e+04,
"stage3_max_live_parameters": 3.000000e+07,
"stage3_prefetch_bucket_size": 0,
"memory_efficient_linear": false
},
"zero_allow_untested_optimizer": true,
"zero_force_ds_cpu_optimizer": false,
"fp16": {
"enabled": false,
"loss_scale_window": 100
},
"bf16": {
"enabled": true
},
"gradient_clipping": 1.0,
"prescale_gradients": false,
"wall_clock_breakdown": false,
"hybrid_engine": {
"enabled": false,
"max_out_tokens": 512,
"inference_tp_size": 1,
"release_inference_cache": false,
"pin_parameters": true,
"tp_gather_partition_size": 8
}
}
***** Before training *****
***** Evaluation Begin *****
0%| | 0/21 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
5%|▍ | 1/21 [00:09<03:14, 9.74s/it]
5%|▍ | 1/21 [00:09<03:16, 9.83s/it]
5%|▍ | 1/21 [00:09<03:16, 9.83s/it]
5%|▍ | 1/21 [00:09<03:16, 9.83s/it]
5%|▍ | 1/21 [00:09<03:16, 9.83s/it]
5%|▍ | 1/21 [00:09<03:16, 9.84s/it]
10%|▉ | 2/21 [00:18<02:54, 9.17s/it]
10%|▉ | 2/21 [00:18<02:54, 9.21s/it]
10%|▉ | 2/21 [00:18<02:54, 9.21s/it]
10%|▉ | 2/21 [00:18<02:54, 9.21s/it]
10%|▉ | 2/21 [00:18<02:54, 9.21s/it]
10%|▉ | 2/21 [00:18<02:54, 9.21s/it]
14%|█▍ | 3/21 [00:26<02:37, 8.74s/it]
14%|█▍ | 3/21 [00:26<02:36, 8.72s/it]
14%|█▍ | 3/21 [00:26<02:37, 8.75s/it]
14%|█▍ | 3/21 [00:26<02:37, 8.76s/it]
14%|█▍ | 3/21 [00:26<02:37, 8.77s/it]
14%|█▍ | 3/21 [00:27<02:40, 8.94s/it]
19%|█▉ | 4/21 [00:35<02:26, 8.64s/it]
19%|█▉ | 4/21 [00:35<02:26, 8.64s/it]
19%|█▉ | 4/21 [00:35<02:27, 8.66s/it]
19%|█▉ | 4/21 [00:35<02:27, 8.67s/it]
19%|█▉ | 4/21 [00:35<02:26, 8.63s/it]
19%|█▉ | 4/21 [00:35<02:27, 8.68s/it]
24%|██▍ | 5/21 [00:42<02:12, 8.27s/it]
24%|██▍ | 5/21 [00:42<02:12, 8.29s/it]
24%|██▍ | 5/21 [00:42<02:12, 8.28s/it]
24%|██▍ | 5/21 [00:42<02:12, 8.30s/it]
24%|██▍ | 5/21 [00:43<02:12, 8.28s/it]
24%|██▍ | 5/21 [00:43<02:15, 8.48s/it]
29%|██▊ | 6/21 [00:51<02:06, 8.42s/it]
29%|██▊ | 6/21 [00:51<02:06, 8.43s/it]
29%|██▊ | 6/21 [00:51<02:05, 8.40s/it]
29%|██▊ | 6/21 [00:51<02:06, 8.45s/it]
29%|██▊ | 6/21 [00:51<02:07, 8.48s/it]
Please let us know if you have other questions.
However, we do not check this template. You need to modify it according to your settings in the file 'DST.py'.
I can run my experiment, let's see the results! Thanks!
I get around 80% accuracy in my internal data. I have a question, could you use the Bradley-Terry model for the loss function?
When you use this comparison pair, the Plackett-Luce model is equivalent to the Bradley-Terry model. You can refer to this derivation process in the background section of this paper.
I'll close this issue. If there are any problems, please open it again.