deepcs233 / Visual-CoT

[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Apache License 2.0
134 stars 7 forks source link

代码错误:visual benchmark无法复现 #8

Open wengwanjiang opened 2 months ago

wengwanjiang commented 2 months ago

非常感谢能够提供readme.md以及权重 即使是按照readme.md中的步骤操作,在evaluatioon过程中仍然出现了许多问题,无法复现

若干错误,我按照README.md里的步骤安装环境,但是遇到了许多问题。

# bash scripts/v1_5/eval/cot_benchmark.sh VisCoT-7b-336
./llava/model/language_model/llava_llama.py,倒数第二行
AutoConfig.register("llava", LlavaConfig)

  File "/home/wengwanjiang/anaconda3/envs/viscot/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 846, in register
    raise ValueError(f"'{key}' is already used by a Transformers config, pick another name.")
ValueError: 'llava' is already used by a Transformers config, pick another name.

以及其他错误:

# 使用huggingface提供代码下载模型
# Load model directly
from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("deepcs233/VisCoT-7b-336")
model = AutoModelForCausalLM.from_pretrained("deepcs233/VisCoT-7b-336")
报错
OSError: deepcs233/VisCoT-7b-336 does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/deepcs233/VisCoT-7b-336/main' for available files.

手动下载模型到./checkpoints/里报以下错误

model = LlavaLlamaForCausalLM.from_pretrained(
                model_path, low_cpu_mem_usage=True, **kwargs
            )
#print(model_path) --> ./checkpoints/VisCoT-7B-336 
#或者手动赋值model_path =  '/data/VisCoT-7B-336' 仍然报错
  File "/home/wengwanjiang/anaconda3/envs/viscot/lib/python3.10/genericpath.py", line 20, in exists
    os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not list
deepcs233 commented 1 month ago

你好!

问题1:这是由于新版本transforemer库不能加载旧的llava model,可以参考这个issue:https://github.com/haotian-liu/LLaVA/issues/968

问题2:该代码库使用clip processor来初始化processor:https://github.com/deepcs233/Visual-CoT/blob/ba9c0b36ca8ed9314b1be609e67f19dfa7c31bf9/llava/model/multimodal_encoder/clip_encoder.py#L23

问题3:这个报错看上去是传入了一个list,而不是模型路径?

neo1zh commented 1 month ago

问题3:这个报错看上去是传入了一个list,而不是模型路径?

My solution is modify the line 24 in config.json to "mm_vision_tower": "openai/clip-vit-large-patch14-336", as a string, but not a list like mm_vision_tower": ["openai/clip-vit-large-patch14-336"],.