A demo without gradio - Githubissues

DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

BSD 3-Clause "New" or "Revised" License

2.77k stars 255 forks source link

Hi, you can try extracting gradio's inference operations manually, as in the following code

if args.model_type == 'vicuna':
    chat_state = default_conversation.copy()
else:
    chat_state = conv_llava_llama_2.copy()

video_path = "your_path"
chat_state.system = ""
img_list = []
llm_message = chat.upload_video(video_path , chat_state, img_list)

while True:
    user_message = input("User/ ")

    chat.ask(user_message, chat_state)

    num_beams = 2
    temperature = 1.0

    llm_message = chat.answer(conv=chat_state,
                                  img_list=img_list,
                                  num_beams=num_beams,
                                  temperature=temperature,
                                  max_new_tokens=300,
                                  max_length=2000)[0]
    print(chat_state.get_prompt())
    print(chat_state)
    print(llm_message)

DAMO-NLP-SG / Video-LLaMA

A demo without gradio #140