detail of slide_captioner_lmdeploy.py

meteorlium commented 2 months ago

In this file, when the sliding window description is generated, only the current image and the previous Caption are entered, is it not entered the previous image? Any considerations? def get_prepared_data(self,): curr_img = load_image(self.img_path_list[self.frame_ptr]) if self.frame_ptr == 0: query = 'This is the first frame of a video, describe it in detail.' else: query = "Here are the Video frame {} at {}.00 Second(s) and Video frame {} at {}.00 Second(s) of a video, describe what happend between them. What happend before is: {}".format( self.frame_ptr, int(self.frame_ptr * 2), self.frame_ptr + 1, int((self.frame_ptr + 1) * 2), self.caption_list[-1]) self.frame_ptr += 1 return (query, curr_img)

aiiph4 commented 2 months ago

+1

zehuichen123 commented 2 months ago

see #22

ShareGPT4Omni / ShareGPT4Video

detail of slide_captioner_lmdeploy.py #24