Open Luoyang144 opened 10 months ago
Describe the issue
Issue: Dataloader in train code maybe wrong.
Command:
deepspeed train.py \ --deepspeed scripts/zero2.json \ --model_name_or_path "LLaVA-VL/vicuna-7b-v1.3" \ --pretrain_mm_mlp_adapter "LLaVA-VL/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin" \ --version v1 \ --data_path data/toy/aug_toy.json,data/toy/merge_toy.json \ --image_folder data/toy/image \ --vision_tower openai/clip-vit-large-patch14 \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --bf16 False \ --output_dir $out_dir \ --num_train_epochs 3 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 2 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 1000 \ --save_total_limit 8 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 False \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --lazy_preprocess True \ --mm_projector_type mlp2x_gelu
Log:
trainer = LLaVATrainer(model=model, TypeError: llava.train.llava_trainer.LLaVATrainer() argument after ** must be a mapping, not NoneType
After debugging the code I find
make_supervised_data_module
may not be finished, as it didn't return anything.def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer, data_args) -> Dict: """Make dataset and collator for supervised fine-tuning.""" dataset_cls = LazySupervisedDataset # concat data files data_path = data_args.data_path data_path_list = [i.strip() for i in data_path.split(',')] data_path_list = [x for x in data_path_list if x != ""] data_set_list = [] for data_name in data_path_list: assert os.path.exists(data_name), f"{data_name} does not exist" new_data_args = copy.deepcopy(data_args) new_data_args.data_path = data_name train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls) data_set_list.append(train_dataset_i) train_dataset = ConcatDataset(data_set_list) print(f"train_dataset size: {len(train_dataset)}")
Any idea?
so,do you have any solution?I meet the same error
Describe the issue
Issue: Dataloader in train code maybe wrong. Command:
deepspeed train.py \ --deepspeed scripts/zero2.json \ --model_name_or_path "LLaVA-VL/vicuna-7b-v1.3" \ --pretrain_mm_mlp_adapter "LLaVA-VL/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin" \ --version v1 \ --data_path data/toy/aug_toy.json,data/toy/merge_toy.json \ --image_folder data/toy/image \ --vision_tower openai/clip-vit-large-patch14 \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --bf16 False \ --output_dir $out_dir \ --num_train_epochs 3 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 2 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 1000 \ --save_total_limit 8 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 False \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --lazy_preprocess True \ --mm_projector_type mlp2x_gelu
Log:
trainer = LLaVATrainer(model=model, TypeError: llava.train.llava_trainer.LLaVATrainer() argument after ** must be a mapping, not NoneType
After debugging the code I find
make_supervised_data_module
may not be finished, as it didn't return anything.def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer, data_args) -> Dict: """Make dataset and collator for supervised fine-tuning.""" dataset_cls = LazySupervisedDataset # concat data files data_path = data_args.data_path data_path_list = [i.strip() for i in data_path.split(',')] data_path_list = [x for x in data_path_list if x != ""] data_set_list = [] for data_name in data_path_list: assert os.path.exists(data_name), f"{data_name} does not exist" new_data_args = copy.deepcopy(data_args) new_data_args.data_path = data_name train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls) data_set_list.append(train_dataset_i) train_dataset = ConcatDataset(data_set_list) print(f"train_dataset size: {len(train_dataset)}")
Any idea?
so,do you have any solution?I meet the same error
I work it by the following steps:
add return and replace **data_module to train_dataset=data_module
but I meet other error about OOM? If someone can solve it,please reply to me.
@kaijieJiao
def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
data_args) -> Dict:
"""Make dataset and collator for supervised fine-tuning."""
dataset_cls = LazySupervisedDataset
# concat data files
data_path = data_args.data_path
data_path_list = [i.strip() for i in data_path.split(',')]
data_path_list = [x for x in data_path_list if x != ""]
data_set_list = []
for data_name in data_path_list:
assert os.path.exists(data_name), f"{data_name} does not exist"
new_data_args = copy.deepcopy(data_args)
new_data_args.data_path = data_name
train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
data_set_list.append(train_dataset_i)
train_dataset = ConcatDataset(data_set_list)
data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)
print(f"train_dataset size: {len(train_dataset)}")
return dict(train_dataset=train_dataset,
eval_dataset=None,
data_collator=data_collator)
data_module = make_supervised_data_module(tokenizer=tokenizer,
data_args=data_args)
trainer = LLaVATrainer(model=model,
tokenizer=tokenizer,
args=training_args,
**data_module)
Referring to other codes in llava, no guarantee that there will be no issues
@kaijieJiao
def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer, data_args) -> Dict: """Make dataset and collator for supervised fine-tuning.""" dataset_cls = LazySupervisedDataset # concat data files data_path = data_args.data_path data_path_list = [i.strip() for i in data_path.split(',')] data_path_list = [x for x in data_path_list if x != ""] data_set_list = [] for data_name in data_path_list: assert os.path.exists(data_name), f"{data_name} does not exist" new_data_args = copy.deepcopy(data_args) new_data_args.data_path = data_name train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls) data_set_list.append(train_dataset_i) train_dataset = ConcatDataset(data_set_list) data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer) print(f"train_dataset size: {len(train_dataset)}") return dict(train_dataset=train_dataset, eval_dataset=None, data_collator=data_collator)
data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args) trainer = LLaVATrainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
Referring to other codes in llava, no guarantee that there will be no issues
Do you train it successfully?
Neither of the solutions above have worked for me. In both cases I faced the same error:
File "/workspace/tools/LaVA-Plus/./train_mem.py", line 13, in <module>
train()
File "/workspace/tools/LLaVA-Plus/llava/train/train.py", line 987, in train
trainer.train()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1553, in _inner_training_loop
train_dataloader = self.get_train_dataloader()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 850, in get_train_dataloader
dataloader_params["sampler"] = self._get_train_sampler()
File "/workspace/tools/LLaVA-Plus/llava/train/llava_trainer.py", line 140, in _get_train_sampler
lengths = self.train_dataset.modality_lengths
AttributeError: 'ConcatDataset' object has no attribute 'modality_lengths'
Traceback (most recent call last):
File "/workspace/tools/LLaVA-Plus/./train_mem.py", line 13, in <module>
train()
File "/workspace/tools/LLaVA-Plus/llava/train/train.py", line 987, in train
trainer.train()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1553, in _inner_training_loop
train_dataloader = self.get_train_dataloader()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 850, in get_train_dataloader
dataloader_params["sampler"] = self._get_train_sampler()
File "/workspace/tools/LLaVA-Plus/llava/train/llava_trainer.py", line 140, in _get_train_sampler
lengths = self.train_dataset.modality_lengths
AttributeError: 'ConcatDataset' object has no attribute 'modality_lengths
Describe the issue
Issue: Dataloader in train code maybe wrong.
Command:
Log:
After debugging the code I find
make_supervised_data_module
may not be finished, as it didn't return anything.Any idea?