cofacts / rumors-api

GraphQL API server for clients like rumors-site and rumors-line-bot
https://api.cofacts.tw
MIT License
111 stars 26 forks source link

Reduce Text-to-speech hallucination #322

Open johnson-liang opened 1 year ago

johnson-liang commented 1 year ago

From 2023/10/11 meeting https://g0v.hackmd.io/t9ypB87SQBuMjjW_PheZVg#Comm-AI-transcript

The current implementation for speech-to-text (based on Whisper API) suffers from hallucination problems. Some of the examples are:

https://cofacts.tw/article/TvR6AosBAjOeMOklfe-g 原來 train data 是來自群眾協作字幕的社群呀

無聲 https://cofacts.tw/article/JvRhAosBAjOeMOklpe-v

我會希望他不要翻譯耶其實 雖然他翻得還 OK https://cofacts.tw/article/FPRXAosBAjOeMOklXO9y

前面好好的 後面沒聲音開始起肖 https://cofacts.tw/article/m_S3AosBAjOeMOkls-_a

慘叫 https://cofacts.tw/article/jvSIBYsBAjOeMOklDvOv

無法解釋 明明有這麼明顯的口白 https://cofacts.tw/article/MvTSCosBAjOeMOklBvlJ

We should investigate:

References

Precious research on Whisper and mitigation to it's hallucination https://g0v.hackmd.io/wkx286lmTDaFUpgRhnUawQ#Whisper

MrOrz commented 1 year ago

Tried to remove apparent hallucination in https://github.com/cofacts/rumors-api/pull/323

MrOrz commented 4 months ago

Update: use multimodal LLM (Gemini) instead, see https://g0v.hackmd.io/wkx286lmTDaFUpgRhnUawQ#Gemini