THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型
Apache License 2.0
5.85k stars 401 forks source link

CogVLM-chat版本如何根据概率大小输出多个回答结果 #321

Closed chensiqin closed 7 months ago

chensiqin commented 8 months ago

System Info / 系統信息

Linux,CUDA 11.8

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

Reproduction / 复现过程

当前CogVLM-chat版本每次可以得到一个回答结果

Expected behavior / 期待表现

使用CogVLM-chat版本如何根据概率大小输出多个回答结果?

zRzRzRzRzRzRzR commented 8 months ago

目前应该是没有做对应脚本,都是单个输入单个输出

chensiqin commented 8 months ago

是否可以通过修改utils/utils/chat.py中的sampling strategy配置去达到多个输出?

zRzRzRzRzRzRzR commented 8 months ago

sat的代码可以尝试问问 @1049451037

chensiqin commented 7 months ago

@1049451037 请问有啥思路可以实现该功能吗?

1049451037 commented 7 months ago

建议直接多次随机采样。因为按照概率大小排序得到的回答一定是几乎一样的,例如“你好”、“你好啊”、”你好呀“,感觉没有什么意义。

chensiqin commented 7 months ago

我在这里看到:https://huggingface.co/blog/zh/how-to-generate 在transformers中可以结合num_return_sequences和num_beams达到输出多个结果,不过如你所说结果间差异其实也不大,SAT中是否可以借鉴实现类似功能?

1049451037 commented 7 months ago

sat里是有beam search的https://github.com/THUDM/SwissArmyTransformer/tree/main/sat/generation/sampling_strategies

可以把BaseStrategy换成BeamSearchStrategy就可以使用。但是如果要实现你说的定制化的复杂strategy,目前sat里还没有支持,不过你可以根据你的需求仿照beam search写一下。欢迎pull request~

chensiqin commented 7 months ago

@1049451037 @zRzRzRzRzRzRzR 请问基于SAT lora微调后的模型(已经merge过),是不是无法通过hf的transformers进行加载和推理?

1049451037 commented 7 months ago

参考:

https://github.com/THUDM/CogVLM/issues/241

https://github.com/THUDM/CogVLM/issues/302

chensiqin commented 7 months ago

@1049451037 @zRzRzRzRzRzRzR 使用上述脚本进行SAT到HF的模型转换,使用这里的https://huggingface.co/THUDM/cogvlm-chat-hf demo加载我本地转换的模型,出现报错: Should have a model_type key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chinese_clip, clap, clip, clip_vision_model, clipseg, clvp, code_llama, codegen, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, data2vec-audio, data2vec-text, data2vec-vision, deberta, deberta-v2, decision_transformer, deformable_detr, deit, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, git, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, graphormer, groupvit, hubert, ibert, idefics, imagegpt, informer, instructblip, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, longformer, longt5, luke, lxmert, m2m_100, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mistral, mixtral, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, mpnet, mpt, mra, mt5, musicgen, mvp, nat, nezha, nllb-moe, nougat, nystromformer, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, pix2struct, plbart, poolformer, pop2piano, prophetnet, pvt, qdqbert, rag, realm, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, sew, sew-d, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, umt5, unispeech, unispeech-sat, univnet, upernet, van, videomae, vilt, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso

1049451037 commented 7 months ago

尝试更新transformers

pip install transformers -U

然后:

from cogagent.modeling_cogagent import CogAgentForCausalLM
from cogagent.configuration_cogagent import CogAgentConfig

config = CogAgentConfig.from_pretrained(ckpt_dir)

model = CogAgentForCausalLM.from_pretrained(
    ckpt_dir,
    config=config,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
model.save_pretrained(save_dir)

如果你是转换cogvlm而不是cogagent:

from cogvlm.modeling_cogagent import CogVLMForCausalLM
from cogvlm.configuration_cogagent import CogVLMConfig

config = CogVLMConfig.from_pretrained(ckpt_dir)

model = CogVLMForCausalLM.from_pretrained(
    ckpt_dir,
    config=config,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
model.save_pretrained(save_dir)

这样就可以得到一个带有正常config的hf目录了。

chensiqin commented 7 months ago
  1. 是执行完SAT到HF模型转换后执行上述?
  2. from cogvlm.modeling_cogagent import CogVLMForCausalLM、from cogvlm.configuration_cogagent import CogVLMConfig无法import,需要在哪个路径下执行?
1049451037 commented 7 months ago

cogvlm、cogagent就是你执行完SAT到HF模型转换后得到的目录,换成你对应的目录名字。

chensiqin commented 7 months ago

还是不行,我转换后的目录是: image 对应到这里需要如何修改: from cogvlm.modeling_cogagent import CogVLMForCausalLM from cogvlm.configuration_cogagent import CogVLMConfig

config = CogVLMConfig.from_pretrained(ckpt_dir)

model = CogVLMForCausalLM.from_pretrained( ckpt_dir, config=config, torch_dtype=torch.bfloat16, low_cpu_mem_usage=False, ) model.save_pretrained(save_dir)

modeling_cogagent和configuration_cogagent的实现是在哪里?

1049451037 commented 7 months ago

emm...一个最简单的方法是,从我们的huggingface仓库把代码下载下来,把里面的safetensors文件删掉,换成你转换后的pytorch_model.bin。

chensiqin commented 7 months ago

是的,这个方法我尝试过是可以的: image 请问你说的方法具体是哪里设置的不对呀?

1049451037 commented 7 months ago

因为我本地有huggingface仓库的文件,所以刚才的代码cogvlm、cogagent就是对应hf仓库,不好意思

chensiqin commented 7 months ago

哦哦,所以是需要在hf的本地路径下执行你的脚本了?我看到modeling_cogvlm.py了

chensiqin commented 7 months ago

尝试更新transformers

pip install transformers -U

然后:

from cogagent.modeling_cogagent import CogAgentForCausalLM
from cogagent.configuration_cogagent import CogAgentConfig

config = CogAgentConfig.from_pretrained(ckpt_dir)

model = CogAgentForCausalLM.from_pretrained(
    ckpt_dir,
    config=config,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
model.save_pretrained(save_dir)

如果你是转换cogvlm而不是cogagent:

from cogvlm.modeling_cogagent import CogVLMForCausalLM
from cogvlm.configuration_cogagent import CogVLMConfig

config = CogVLMConfig.from_pretrained(ckpt_dir)

model = CogVLMForCausalLM.from_pretrained(
    ckpt_dir,
    config=config,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
model.save_pretrained(save_dir)

这样就可以得到一个带有正常config的hf目录了。

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.06s/it] Some weights of CogVLMForCausalLM were not initialized from the model checkpoint at /workdir/chensiqin/code/lmm/CogVLM/checkpoints_hf/12-06-10-18/merged_lora_490_2 and are newly initialized: ['model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.17.self_attn.rotary_emb.inv_freq'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-risk-control-algo/chensiqin/code/lmm/new/CogVLM/test_hf.py", line 17, in ).to('cuda').eval() File "/workdir/chensiqin/opt/anaconda3/envs/cogvlm_new/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2460, in to return super().to(*args, **kwargs) File "/workdir/chensiqin/opt/anaconda3/envs/cogvlm_new/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/workdir/chensiqin/opt/anaconda3/envs/cogvlm_new/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/workdir/chensiqin/opt/anaconda3/envs/cogvlm_new/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/workdir/chensiqin/opt/anaconda3/envs/cogvlm_new/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 4 more times] File "/workdir/chensiqin/opt/anaconda3/envs/cogvlm_new/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/workdir/chensiqin/opt/anaconda3/envs/cogvlm_new/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) NotImplementedError: Cannot copy out of meta tensor; no data! image

生成config后,还是无法加载

1049451037 commented 7 months ago

说明你的test_hf.py写的不对。

chensiqin commented 7 months ago

参考:https://huggingface.co/THUDM/cogvlm-chat-hf

import torch
import requests
from PIL import Image
from transformers import AutoModelForCausalLM, LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained('/workdir/chensiqin/.sat_models/vicuna-7b-v1.5')
model = AutoModelForCausalLM.from_pretrained(
    '/workdir/chensiqin/code/lmm/CogVLM/checkpoints_hf/12-06-10-18/merged_lora_490',
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    load_in_4bit=False,
    trust_remote_code=True
).to('cuda').eval()
print('Model inited.')

query = 'Describle this image'
image = Image.open('for_test/2.png').convert('RGB')
inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image])  # chat mode
inputs = {
    'input_ids': inputs['input_ids'].unsqueeze(0).to('cuda'),
    'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to('cuda'),
    'attention_mask': inputs['attention_mask'].unsqueeze(0).to('cuda'),
    'images': [[inputs['images'][0].to('cuda').to(torch.bfloat16)]],
}
gen_kwargs = {"max_length": 2048, "do_sample": False}

with torch.no_grad():
    print(inputs['input_ids'].shape[1])
    outputs = model.generate(**inputs, **gen_kwargs)
    print(outputs.shape)
    outputs = outputs[:, inputs['input_ids'].shape[1]:]
    print(tokenizer.decode(outputs[0]))
1049451037 commented 7 months ago

这件事取决于你的目录里有没有model.safetensors.index.json,因为我看你上面的截图,我也很懵。如果你的checkpoint是pytorch_model.bin,请删掉那个model.safetensors.index.json,如果你的checkpoint是safetensors,那请把model.safetensors.index.json也拷贝过去。

chensiqin commented 7 months ago

尝试更新transformers

pip install transformers -U

然后:

from cogagent.modeling_cogagent import CogAgentForCausalLM
from cogagent.configuration_cogagent import CogAgentConfig

config = CogAgentConfig.from_pretrained(ckpt_dir)

model = CogAgentForCausalLM.from_pretrained(
    ckpt_dir,
    config=config,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
model.save_pretrained(save_dir)

如果你是转换cogvlm而不是cogagent:

from cogvlm.modeling_cogagent import CogVLMForCausalLM
from cogvlm.configuration_cogagent import CogVLMConfig

config = CogVLMConfig.from_pretrained(ckpt_dir)

model = CogVLMForCausalLM.from_pretrained(
    ckpt_dir,
    config=config,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
model.save_pretrained(save_dir)

这样就可以得到一个带有正常config的hf目录了。

第一种方式: 直接把hf目录里面的safetensors文件和model.safetensors.index.json删掉,换成转换后的pytorch_model.bin。这种方式可以通过代码成功加载和推理。 image

第二种方式: 通过上述代码生成config文件,目录里面有safetensors文件和model.safetensors.index.json,但是无法加载。 image

模型加载和推理的代码都是上述同一份。

1049451037 commented 7 months ago

那试一下这样呢:

from cogagent.modeling_cogagent import CogAgentForCausalLM
from cogagent.configuration_cogagent import CogAgentConfig

config = CogAgentConfig.from_pretrained(ckpt_dir)

model = CogAgentForCausalLM.from_pretrained(
    ckpt_dir,
    config=config,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
).cpu().eval()
model.save_pretrained(save_dir)

或者

from cogvlm.modeling_cogvlm import CogVLMForCausalLM
from cogvlm.configuration_cogvlm import CogVLMConfig

config = CogVLMConfig.from_pretrained(ckpt_dir)

model = CogVLMForCausalLM.from_pretrained(
    ckpt_dir,
    config=config,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
).cpu().eval()
model.save_pretrained(save_dir)

另外,确保你转的模型是对的,cogvlm请使用cogvlm的脚本,cogagent请使用cogagent的脚本。

chensiqin commented 7 months ago

可以了,感谢老哥!上面的cogvlm代码修改下,cogagent->cogvlm

from cogvlm.modeling_cogvlm import CogVLMForCausalLM
from cogvlm.configuration_cogvlm import CogVLMConfig

config = CogVLMConfig.from_pretrained(ckpt_dir)

model = CogVLMForCausalLM.from_pretrained(
    ckpt_dir,
    config=config,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
).cpu().eval()
model.save_pretrained(save_dir)

还需要拷贝hf目录下的visual.py到save_dir。