Open gehong-coder opened 2 days ago
我的指令为 Describe the video, including a comprehensive description (The description should enable the AI to accurately recreate the video), camera motion (e.g., pan left, zoom in, tilt up, tracking shots), content category (e.g., nature scenery, human actions, animals actions), and any visual effects (VFX), if the video contains effects, describe them briefly in one sentence, else keep the VFX field as an empty string. Output the response in the following JSON format: { "description": "", "camera_motion": "", "content_category": "", "VFX": "" }
Checklist
Describe the bug
我使用InternVL2-40B的模型输出的结果总是会出现中文的字符,类似于下面 { "description": "The video features a woman performing a series of sit-ups on a black yoga mat in a minimalist room. She is wearing a white tank top and blue leggings, with her hair neatly tied back. The room has a white wall, a potted plant on the left, and a wooden bench with a green yoga mat and a basket on the right. The woman starts by lying on her back with her arms extended, then gradually lifts her upper body off the mat, engaging her core muscles. The video includes a countdown timer in the upper right corner, starting from 10 and decreasing by one number with each repetition.", 'camera_motion': 'static', ‘content_category’: ‘human actions’, ’VFX’: "" } 而使用 26B 的模型输出的基本都是json结构化完整的,为啥会有这样的区别呀?
Reproduction
官方的脚本 生成参数如下: generation_config = dict( max_new_tokens=1024, do_sample=True, temperature=0.75, min_length=15, no_repeat_ngram_size=3, top_p= 0.7 )
Environment
Error traceback
No response