dvlab-research / LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Apache License 2.0
742 stars 44 forks source link

Please switch to 'llama-vid-vicuna-7b-short' to chat with upload short videos. After switch, you can clear the conversaton then retry. #99

Open SCUTE-ZZ opened 4 months ago

SCUTE-ZZ commented 4 months ago

Cannot use Gradio Web chat with upload short videos, There is only a simple return in the code.

148     @torch.inference_mode()
149     def generate_stream(self, params):
150         prompt = params["prompt"]
151         ori_prompt = prompt
152         images = params.get("images", None)
153         videos = params.get("videos", None)
154         num_image_tokens = 0
155 
156         if len(videos) > 0 and len(images) == 0:
157             yield json.dumps({"text": ori_prompt + "Please switch to \'llama-vid-vicuna-7b-short\' to chat with upload short videos. After switch, you can clear the conversaton then retry.", "error_co    de": 0}).encode() + b"\0"
158             return
159 
160         self.load_model_from_cpu()
161         tokenizer, model, image_processor = self.tokenizer, self.model, self.image_processor
162 
163         images = [load_image_from_base64(image) for image in images]
164         image = np.array(images[0])
165 
166         movie_part = int(params.get("movie_part", 1))