Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.55k stars 242 forks source link

Questions about the training command #203

Closed Hongbin98 closed 1 year ago

Hongbin98 commented 1 year ago

The training command in the README.MD is:

accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \
pipeline/train/instruction_following.py \
--pretrained_model_name_or_path=luodian/OTTER-LLaMA7B-INIT  \ # or --pretrained_model_name_or_path=luodian/OTTER-MPT7B-Init
--mimicit_path="path/to/DC_instruction.json" \
--images_path="path/to/DC.json" \
--train_config_path="path/to/DC_train.json" \
--batch_size=4 \
--num_epochs=9 \
--report_to_wandb \
--wandb_entity=ntu-slab \
--run_name=OTTER-LLaMA7B-densecaption \
--wandb_project=OTTER-LLaMA7B \
--workers=1 \
--lr_scheduler=cosine \
--learning_rate=1e-5 \
--warmup_steps_ratio=0.01 \

And I change the 'pretrained_model_name_or_path' and the path to mimicit:

accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \
pipeline/train/instruction_following.py \
--pretrained_model_name_or_path='/home/linhb/hf_models/OTTER-MPT7B-Init' \
--mimicit_path="/home/linhb/dataset/MIMIC-IT/DC/DC_instructions.json" \
--images_path="/home/linhb/dataset/MIMIC-IT/DC/DC.json" \
--train_config_path="/home/linhb/dataset/MIMIC-IT/DC/DC_train.json" \
--batch_size=4 \
--num_epochs=9 \
--report_to_wandb \
--wandb_entity=ntu-slab \
--run_name=OTTER-LLaMA7B-densecaption \
--wandb_project=OTTER-LLaMA7B \
--workers=1 \
--lr_scheduler=cosine \
--learning_rate=1e-5 \
--warmup_steps_ratio=0.01 \

However, it reports: 'mimicit_path', 'images_path' and 'train_config_path' are unrecognized arguments for instruction_following.py.

So, I wonder if there are any new training command for the latest training code.

Looking forward to your reply!

Hongbin98 commented 1 year ago

still waiting for your response :(

In the instruction_following.py, there are confused arguments like 'past_mimicit_path' and 'new_mimicit_path'. When I try to set the arguments '--past' to None and '--new' to the dir of my DC files, it reports a size mismatch error. accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \ pipeline/train/instruction_following.py \ --pretrained_model_name_or_path='/home/linhb/hf_models/OTTER-MPT7B-Init' \ --new_mimicit_path="/home/linhb/dataset/MIMIC-IT/DC/DC_instructions.json" \ --new_images_path="/home/linhb/dataset/MIMIC-IT/DC/DC.json" \ --new_train_config_path="/home/linhb/dataset/MIMIC-IT/DC/DC_train.json" \ --batch_size=4 \ --num_epochs=9 \ --report_to_wandb \ --run_name=OTTER-MPT7B-densecaption \ --wandb_project=otter \ --workers=1 \ --lr_scheduler=cosine \ --learning_rate=1e-5 \ --warmup_steps_ratio=0.01

And the error is stack expects each tensor to be equal size, but got [1, 92, 3, 224, 224] at entry 0 and [1, 117, 3, 224, 224] at entry 1

Luodian commented 1 year ago

still waiting for your response :(

In the instruction_following.py, there are confused arguments like 'past_mimicit_path' and 'new_mimicit_path'. When I try to set the arguments '--past' to None and '--new' to the dir of my DC files, it reports a size mismatch error. accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \ pipeline/train/instruction_following.py \ --pretrained_model_name_or_path='/home/linhb/hf_models/OTTER-MPT7B-Init' \ --new_mimicit_path="/home/linhb/dataset/MIMIC-IT/DC/DC_instructions.json" \ --new_images_path="/home/linhb/dataset/MIMIC-IT/DC/DC.json" \ --new_train_config_path="/home/linhb/dataset/MIMIC-IT/DC/DC_train.json" \ --batch_size=4 \ --num_epochs=9 \ --report_to_wandb \ --run_name=OTTER-MPT7B-densecaption \ --wandb_project=otter \ --workers=1 \ --lr_scheduler=cosine \ --learning_rate=1e-5 \ --warmup_steps_ratio=0.01

And the error is stack expects each tensor to be equal size, but got [1, 92, 3, 224, 224] at entry 0 and [1, 117, 3, 224, 224] at entry 1

The new_mimicit_path should be a naming issue. We will restore it to "mimicit_path" soon in a PR. If you do not have to put something back into current training, you can just ignore the past_mimicit_path and relevant args.

Luodian commented 1 year ago

still waiting for your response :(

In the instruction_following.py, there are confused arguments like 'past_mimicit_path' and 'new_mimicit_path'. When I try to set the arguments '--past' to None and '--new' to the dir of my DC files, it reports a size mismatch error. accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \ pipeline/train/instruction_following.py \ --pretrained_model_name_or_path='/home/linhb/hf_models/OTTER-MPT7B-Init' \ --new_mimicit_path="/home/linhb/dataset/MIMIC-IT/DC/DC_instructions.json" \ --new_images_path="/home/linhb/dataset/MIMIC-IT/DC/DC.json" \ --new_train_config_path="/home/linhb/dataset/MIMIC-IT/DC/DC_train.json" \ --batch_size=4 \ --num_epochs=9 \ --report_to_wandb \ --run_name=OTTER-MPT7B-densecaption \ --wandb_project=otter \ --workers=1 \ --lr_scheduler=cosine \ --learning_rate=1e-5 \ --warmup_steps_ratio=0.01

And the error is stack expects each tensor to be equal size, but got [1, 92, 3, 224, 224] at entry 0 and [1, 117, 3, 224, 224] at entry 1

Expectedly we will resample the frames into a fixed range, so that would enable batchwise training.

You can modify the relevant function into

    def process_dense_caption(self, instruction_id, instruction, answer, image_ids, in_context_example_ids, resample_frames=32):
        patch_images = torch.tensor([])
        all_texts = ""
        all_instruction_ids = in_context_example_ids + [instruction_id]
        random.shuffle(all_instruction_ids)
        for cur_instruction_id in all_instruction_ids[:]:
            cur_instruction = self.dataset[cur_instruction_id]["instruction"]
            cur_instruction = self.pre_question(cur_instruction, self.max_src_length)
            cur_answer = self.dataset[cur_instruction_id]["answer"]
            cur_answer = self.pre_answer(cur_answer, self.max_tgt_length)
            cur_text = f"User: {cur_instruction} GPT:<answer> {cur_answer}<|endofchunk|>"
            all_texts += cur_text

        all_texts = f"<image>{all_texts}"
        # <image>User: {cur_incontext_instruction} GPT:<answer> {cur_incontext_answer}<|endofchunk|>User: {instruction} GPT:<answer> {answer}<|endofchunk|>
        # <image>User: what does the image describe? GPT: XXX <|endofchunk|>User: Do you think this image is funny GPT:<answer> YYY <|endofchunk|>
        image_ids = self.resample_frames(image_ids, resample_frames)
        for cur_image_id in image_ids:
            cur_image = self.images[cur_image_id]
            cur_image = Image.open(BytesIO(base64.urlsafe_b64decode(cur_image))).convert("RGB")
            cur_patch_image = self.patch_resize_transform(cur_image).unsqueeze(0)
            if len(patch_images) == 0:
                patch_images = cur_patch_image
            else:
                patch_images = torch.cat((patch_images, cur_patch_image))

        patch_images = patch_images.unsqueeze(0)
        return patch_images, all_texts
Luodian commented 1 year ago

The resample_frames is to uniformly extract fixed number of frames from different batch entries.

Luodian commented 1 year ago

These all would be fixed in a next PR. You can update your current code manually or wait next PR and then pull them.

Hongbin98 commented 1 year ago

Thanks for your reply! Now I am succeed to run the training code of Otter.

If someone meet the same problem like me, you can also fix this issue by the following steps:

  1. set the arguments '--past' to None and '--new' to the directory of your DC files
  2. modify the function 'process_dense_caption' as @Luodian suggestions It works for me.