LLaVA-VL / LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
https://llava-vl.github.io/llava-plus/
Apache License 2.0
703 stars 53 forks source link

[Usage] Dataloader in train code maybe wrong. #21

Open Luoyang144 opened 10 months ago

Luoyang144 commented 10 months ago

Describe the issue

Issue: Dataloader in train code maybe wrong.

Command:

deepspeed train.py \
    --deepspeed scripts/zero2.json \
    --model_name_or_path "LLaVA-VL/vicuna-7b-v1.3" \
    --pretrain_mm_mlp_adapter "LLaVA-VL/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin"  \
    --version v1 \
    --data_path data/toy/aug_toy.json,data/toy/merge_toy.json \
    --image_folder data/toy/image \
    --vision_tower openai/clip-vit-large-patch14 \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 False \
    --output_dir $out_dir \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 8 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 False \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --mm_projector_type mlp2x_gelu

Log:

    trainer = LLaVATrainer(model=model,
TypeError: llava.train.llava_trainer.LLaVATrainer() argument after ** must be a mapping, not NoneType

After debugging the code I find make_supervised_data_module may not be finished, as it didn't return anything.

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
                                data_args) -> Dict:
    """Make dataset and collator for supervised fine-tuning."""
    dataset_cls = LazySupervisedDataset

    #  concat data files
    data_path = data_args.data_path
    data_path_list = [i.strip() for i in data_path.split(',')]
    data_path_list = [x for x in data_path_list if x != ""]

    data_set_list = []
    for data_name in data_path_list:
        assert os.path.exists(data_name), f"{data_name} does not exist"
        new_data_args = copy.deepcopy(data_args)
        new_data_args.data_path = data_name
        train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
        data_set_list.append(train_dataset_i)
    train_dataset = ConcatDataset(data_set_list)
    print(f"train_dataset size: {len(train_dataset)}")

Any idea?

kaijieJiao commented 9 months ago

Describe the issue

Issue: Dataloader in train code maybe wrong.

Command:

deepspeed train.py \
    --deepspeed scripts/zero2.json \
    --model_name_or_path "LLaVA-VL/vicuna-7b-v1.3" \
    --pretrain_mm_mlp_adapter "LLaVA-VL/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin"  \
    --version v1 \
    --data_path data/toy/aug_toy.json,data/toy/merge_toy.json \
    --image_folder data/toy/image \
    --vision_tower openai/clip-vit-large-patch14 \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 False \
    --output_dir $out_dir \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 8 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 False \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --mm_projector_type mlp2x_gelu

Log:

    trainer = LLaVATrainer(model=model,
TypeError: llava.train.llava_trainer.LLaVATrainer() argument after ** must be a mapping, not NoneType

After debugging the code I find make_supervised_data_module may not be finished, as it didn't return anything.

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
                                data_args) -> Dict:
    """Make dataset and collator for supervised fine-tuning."""
    dataset_cls = LazySupervisedDataset

    #  concat data files
    data_path = data_args.data_path
    data_path_list = [i.strip() for i in data_path.split(',')]
    data_path_list = [x for x in data_path_list if x != ""]

    data_set_list = []
    for data_name in data_path_list:
        assert os.path.exists(data_name), f"{data_name} does not exist"
        new_data_args = copy.deepcopy(data_args)
        new_data_args.data_path = data_name
        train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
        data_set_list.append(train_dataset_i)
    train_dataset = ConcatDataset(data_set_list)
    print(f"train_dataset size: {len(train_dataset)}")

Any idea?

so,do you have any solution?I meet the same error

kaijieJiao commented 9 months ago

Describe the issue

Issue: Dataloader in train code maybe wrong. Command:

deepspeed train.py \
    --deepspeed scripts/zero2.json \
    --model_name_or_path "LLaVA-VL/vicuna-7b-v1.3" \
    --pretrain_mm_mlp_adapter "LLaVA-VL/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin"  \
    --version v1 \
    --data_path data/toy/aug_toy.json,data/toy/merge_toy.json \
    --image_folder data/toy/image \
    --vision_tower openai/clip-vit-large-patch14 \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 False \
    --output_dir $out_dir \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 8 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 False \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --mm_projector_type mlp2x_gelu

Log:

    trainer = LLaVATrainer(model=model,
TypeError: llava.train.llava_trainer.LLaVATrainer() argument after ** must be a mapping, not NoneType

After debugging the code I find make_supervised_data_module may not be finished, as it didn't return anything.

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
                                data_args) -> Dict:
    """Make dataset and collator for supervised fine-tuning."""
    dataset_cls = LazySupervisedDataset

    #  concat data files
    data_path = data_args.data_path
    data_path_list = [i.strip() for i in data_path.split(',')]
    data_path_list = [x for x in data_path_list if x != ""]

    data_set_list = []
    for data_name in data_path_list:
        assert os.path.exists(data_name), f"{data_name} does not exist"
        new_data_args = copy.deepcopy(data_args)
        new_data_args.data_path = data_name
        train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
        data_set_list.append(train_dataset_i)
    train_dataset = ConcatDataset(data_set_list)
    print(f"train_dataset size: {len(train_dataset)}")

Any idea?

so,do you have any solution?I meet the same error

I work it by the following steps: image image

add return and replace **data_module to train_dataset=data_module

but I meet other error about OOM? If someone can solve it,please reply to me.

Luoyang144 commented 9 months ago

@kaijieJiao

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
                                data_args) -> Dict:
    """Make dataset and collator for supervised fine-tuning."""
    dataset_cls = LazySupervisedDataset

    #  concat data files
    data_path = data_args.data_path
    data_path_list = [i.strip() for i in data_path.split(',')]
    data_path_list = [x for x in data_path_list if x != ""]

    data_set_list = []
    for data_name in data_path_list:
        assert os.path.exists(data_name), f"{data_name} does not exist"
        new_data_args = copy.deepcopy(data_args)
        new_data_args.data_path = data_name
        train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
        data_set_list.append(train_dataset_i)
    train_dataset = ConcatDataset(data_set_list)
    data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)
    print(f"train_dataset size: {len(train_dataset)}")
    return dict(train_dataset=train_dataset,
                eval_dataset=None,
                data_collator=data_collator)
data_module = make_supervised_data_module(tokenizer=tokenizer,
                                              data_args=data_args)
trainer = LLaVATrainer(model=model,
                    tokenizer=tokenizer,
                    args=training_args,
                    **data_module)

Referring to other codes in llava, no guarantee that there will be no issues

kaijieJiao commented 9 months ago

@kaijieJiao

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer,
                                data_args) -> Dict:
    """Make dataset and collator for supervised fine-tuning."""
    dataset_cls = LazySupervisedDataset

    #  concat data files
    data_path = data_args.data_path
    data_path_list = [i.strip() for i in data_path.split(',')]
    data_path_list = [x for x in data_path_list if x != ""]

    data_set_list = []
    for data_name in data_path_list:
        assert os.path.exists(data_name), f"{data_name} does not exist"
        new_data_args = copy.deepcopy(data_args)
        new_data_args.data_path = data_name
        train_dataset_i = build_dataset(new_data_args, tokenizer, dataset_cls)
        data_set_list.append(train_dataset_i)
    train_dataset = ConcatDataset(data_set_list)
    data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)
    print(f"train_dataset size: {len(train_dataset)}")
    return dict(train_dataset=train_dataset,
                eval_dataset=None,
                data_collator=data_collator)
data_module = make_supervised_data_module(tokenizer=tokenizer,
                                              data_args=data_args)
trainer = LLaVATrainer(model=model,
                    tokenizer=tokenizer,
                    args=training_args,
                    **data_module)

Referring to other codes in llava, no guarantee that there will be no issues

Do you train it successfully?

pedramaghazadeh commented 8 months ago

Neither of the solutions above have worked for me. In both cases I faced the same error:


  File "/workspace/tools/LaVA-Plus/./train_mem.py", line 13, in <module>
    train()
  File "/workspace/tools/LLaVA-Plus/llava/train/train.py", line 987, in train
    trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1553, in _inner_training_loop
    train_dataloader = self.get_train_dataloader()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 850, in get_train_dataloader
    dataloader_params["sampler"] = self._get_train_sampler()
  File "/workspace/tools/LLaVA-Plus/llava/train/llava_trainer.py", line 140, in _get_train_sampler
    lengths = self.train_dataset.modality_lengths
AttributeError: 'ConcatDataset' object has no attribute 'modality_lengths'
Traceback (most recent call last):
  File "/workspace/tools/LLaVA-Plus/./train_mem.py", line 13, in <module>
    train()
  File "/workspace/tools/LLaVA-Plus/llava/train/train.py", line 987, in train
    trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1553, in _inner_training_loop
    train_dataloader = self.get_train_dataloader()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 850, in get_train_dataloader
    dataloader_params["sampler"] = self._get_train_sampler()
  File "/workspace/tools/LLaVA-Plus/llava/train/llava_trainer.py", line 140, in _get_train_sampler
    lengths = self.train_dataset.modality_lengths
AttributeError: 'ConcatDataset' object has no attribute 'modality_lengths