RupertLuo / Valley

The official repository of "Video assistant towards large language model makes everything easy"
196 stars 13 forks source link

How to inference with bash for multi-round conversation? #8

Closed TonyXuQAQ closed 1 year ago

TonyXuQAQ commented 1 year ago

Hi, may I know how to talk with Valley for multi-conversations by bash inference? Thanks for your help!

RupertLuo commented 1 year ago

I have written a script for conversation inference in the shell at valey/inference/inference_valley_conv.py, but this code may be outdated. You can use the following code to rewrite it in the form of multiple rounds of dialogue by calling the api, which currently supports the format of the openai api.

from transformers import AutoTokenizer
from valley.model.valley import ValleyLlamaForCausalLM
def init_vision_token(model,tokenizer):
    vision_config = model.get_model().vision_tower.config
    vision_config.im_start_token, vision_config.im_end_token = tokenizer.convert_tokens_to_ids([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN])
    vision_config.vi_start_token, vision_config.vi_end_token = tokenizer.convert_tokens_to_ids([DEFAULT_VI_START_TOKEN, DEFAULT_VI_END_TOKEN])
    vision_config.vi_frame_token = tokenizer.convert_tokens_to_ids(DEFAULT_VIDEO_FRAME_TOKEN)
    vision_config.im_patch_token = tokenizer.convert_tokens_to_ids([DEFAULT_IMAGE_PATCH_TOKEN])[0]

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# input the query
query = "Describe the video concisely."
# input the systemprompt
system_prompt = "You are Valley, a large language and vision assistant trained by ByteDance. You are able to understand the visual content or video that the user provides, and assist the user with a variety of tasks using natural language. Follow the instructions carefully and explain your answers in detail."

model_path = THE MODEL PATH
model = ValleyLlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_path)
init_vision_token(model,tokenizer)
model = model.to(device)
model.eval()

# we support openai format input
message = [ {"role":'system','content':system_prompt},
            {"role":"user", "content": 'Hi!'},
            {"role":"assistent", "content": 'Hi there! How can I help you today?'},
            {"role":"user", "content": query}]

gen_kwargs = dict(
    do_sample=True,
    temperature=0.2,
    max_new_tokens=1024,
)
response = model.completion(tokenizer, args.video_file, message, gen_kwargs, device)
TonyXuQAQ commented 1 year ago

Thanks for the information!