daily-demos / rtvi-client-android-demo

BSD 2-Clause "Simplified" License
11 stars 0 forks source link

Camera Vision realtime still not support ? #11

Open skyxiaobai opened 2 weeks ago

skyxiaobai commented 2 weeks ago

I tested found vision parts just show front camera view, but vision frame not realtime comunicated with LLM, So, Any plan to support this?

marcus-daily commented 2 weeks ago

Thanks for the question @skyxiaobai. Vision is only supported with some models, such as Claude Sonnet. Also please make sure you have "Voice and Vision" selected in the settings.

image

skyxiaobai commented 2 weeks ago

Thank you for your reminder. This is what I am doing. I am using GPT-4o by default and the RTVI Android SDK. After generating the APK and running it on the phone, I can see that the camera opens, but I found out that during the conversation, the content from the camera cannot be recognized. For example, in a scenario where you ask, "Can you see what I am doing?" the camera content is not detected. Actually, my goal is to utilize the camera's real-time video stream for conversation, similar to a video chat function. However, after testing, I found that only voice is real-time. Thanks again for your support.