Azure-Samples / azureai-samples

Official community-driven Azure AI Examples
MIT License
240 stars 164 forks source link

Running vision with video sample throws an error #114

Open prashant-bhandari opened 2 months ago

prashant-bhandari commented 2 months ago

Operating System

Windows

Version Information

While running the sample https://github.com/Azure-Samples/azureai-samples/blob/main/scenarios/GPT-4V/video/video_chatcompletions_example_restapi.ipynb with own video throws an error: {"choices":[{"messages":[{"delta":{"role":"tool", "content": "{\"ErrorMessage\":The 'video' enhancement requires a data source of type 'AzureComputerVisionVideoIndex'.,\"ErrorCode\": 400}"}}]}]}

This is how my payload looks: payload = { "model": "gpt-4-vision-preview", "enhancements": { "video": { "enabled": True } }, "dataSources": [ { "type": "AzureComputerVisionVideoIndex", "parameters": { "computerVisionBaseUrl": f"{vision_api.get('endpoint')}computervision", "computerVisionApiKey": vision_api.get("key"), "indexName": video_index.get("video_index_name"), "videoUrls": [video_index.get("video_SAS_url")], }, } ], "messages": messages, "max_tokens": 800, "stream": True }

Few changes were made to the api_url from this documentation https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/gpt-with-vision?tabs=rest%2Csystem-assigned%2Cresource

Steps to reproduce

Run the above sample with a new video

Expected behavior

suppose to get output like this: { "id": "chatcmpl-8V4J2cFo7TWO7rIfs47XuDzTKvbct", "object": "chat.completion", "created": 1702415412, "model": "gpt-4", "choices": [ { "finish_reason":"stop", "index": 0, "message": { "role": "assistant", "content": "The advertisement video opens with a blurred background that suggests a serene and aesthetically pleasing environment, possibly a workspace with a nature view. As the video progresses, a series of frames showcase a digital interface with search bars and prompts like \"Inspire new ideas,\" \"Research a topic,\" and \"Organize my plans,\" suggesting features of a software or application designed to assist with productivity and creativity.\n\nThe color palette is soft and varied, featuring pastel blues, pinks, and purples, creating a calm and inviting atmosphere. The backgrounds of some frames are adorned with abstract, organically shaped elements and animations, adding to the sense of innovation and modernity.\n\nMidway through the video, the focus shifts to what appears to be a browser or software interface with the phrase \"Screens simulated, subject to change; feature availability and timing may vary,\" indicating the product is in development and that the visuals are illustrative of its capabilities.\n\nThe use of text prompts continues with \"Help me relax,\" followed by a demonstration of a 'dark mode' feature, providing a glimpse into the software's versatility and user-friendly design.\n\nThe video concludes by revealing the product name, \"Copilot,\" and positioning it as \"Your everyday AI companion,\" implying the use of artificial intelligence to enhance daily tasks. The final frames feature the Microsoft logo, associating the product with the well-known technology company.\n\nIn summary, the advertisement video is for a Microsoft product named \"Copilot,\" which seems to be an AI-powered software tool aimed at improving productivity, creativity, and organization for its users. The video conveys a message of innovation, ease, and support in daily digital interactions through a visually appealing and calming presentation." } } ], "usage": { "prompt_tokens": 2068, "completion_tokens": 341, "total_tokens": 2409 } }

Actual behavior

Gets back this {"choices":[{"messages":[{"delta":{"role":"tool", "content": "{\"ErrorMessage\":The 'video' enhancement requires a data source of type 'AzureComputerVisionVideoIndex'.,\"ErrorCode\": 400}"}}]}]}

Addition information

No response