Hon-Wong / Elysium

[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
https://hon-wong.github.io/Elysium/
59 stars 2 forks source link

Question about the settings for tokenizer and frames_ops of data_preprocess in configs/sample_config.yaml #6

Closed Xuchen-Li closed 4 months ago

Xuchen-Li commented 4 months ago

Hello, sorry for bothering you again.

   data_preprocess:
      with_visual: True
      frames_key: frames
      sample_method: random_clip
      label_key: "vqa"
      task_type: vqa
      tokenizer: "item"
      max_seq_len: 512
      max_prompt_len: 256
      vqa_processor_params:
        box_format: ours_v1
      online_vqa_processor_params:
        task: SOT
      num_segments: 1
      verbose: True
      training: False
      frames_ops:
        Resize:
          size: [336, 336]
        ToTensor: {}
        Normalize:
          mean: [0.48145466, 0.4578275, 0.40821073]
          std: [0.26862954, 0.26130258, 0.27577711]

I am wondering about the settings for tokenizer and frames_ops in configs/sample_config.yaml for

      if self.with_visual:
            if isinstance(frames_ops, str):
                self.video_processor = AutoImageProcessor.from_pretrained(frames_ops)
            else:
                self.video_processor = VisionProcessor(frames_ops)

and

        local_path = tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(
            local_path, use_fast=False, trust_remote_code=trust_remote_code
        )

in eval/data/video_llm_data.py line 98 - 103 and line 123 - 128.

How to load video_processor and tokenizer from the pretrained model as setting in configs/sample_config.yaml.

Thanks a lot!

Hon-Wong commented 4 months ago

Thanks for your attention!

The tokenizer path corresponds to the path of the LLM, such as Llama2's path. There is no need to modify frames_ops in config.yaml unless you want to use a different processor, such as CLIP's official processor. If you prefer to use CLIPViT or Siglip's official processor, simply set frames_ops to {path/to/clipvit} or {path/to/siglip}.

Xuchen-Li commented 4 months ago

Thanks for your attention!

The tokenizer path corresponds to the path of the LLM, such as Llama2's path. There is no need to modify frames_ops in config.yaml unless you want to use a different processor, such as CLIP's official processor. If you prefer to use CLIPViT or Siglip's official processor, simply set frames_ops to {path/to/clipvit} or {path/to/siglip}.

Thanks a lot!