Vision-CAIR / MiniGPT4-video

Official code for MiniGPT4-video
https://vision-cair.github.io/MiniGPT4-video/
BSD 3-Clause "New" or "Revised" License
440 stars 46 forks source link

How to appropriately register data builder? #28

Open Tony363 opened 2 weeks ago

Tony363 commented 2 weeks ago

Hi,

I am trying to finetune MiniGPT4-Video on my custom dataset. I could not seem to register my own data builder so I modified the Registry method like below.

    @classmethod
    def get_builder_class(cls, name):
        from minigpt4.datasets.builders.image_text_pair_builder import EngageNetBuilder
        return cls.mapping["builder_name_mapping"].get(name, EngageNetBuilder)

I have the below in the datasets/builders/image_text_pair_builder.py

@registry.register_builder("engagenet")
class EngageNetBuilder(BaseDatasetBuilder):
    train_dataset_cls = EngageNetDataset 

    DATASET_CONFIG_DICT = {
        "default": "configs/datasets/engagenet/default.yaml",
    }
    print(DATASET_CONFIG_DICT)

    def build_datasets(self):
        # download, split, etc...
        # only called on 1 GPU/TPU in distributed
        self.build_processors()

        build_info = self.config.build_info # information from the config file
        datasets = dict()

        # create datasets
        dataset_cls = self.train_dataset_cls
        datasets['train'] = dataset_cls(
            vis_processor=self.vis_processors["train"], # Add the vis_processor here
            text_processor=self.text_processors["train"], # Add the text_processor here
            vis_root=build_info.vis_root, # Add videos path here
            ann_paths=build_info.ann_paths, # Add annotations path here
            subtitles_path=build_info.subtitles_path, # Add subtitles path here
            model_name='mistral' # Add model name here (llama2 or mistral)
        )

        return datasets

How to appropriately register data builder?