Open johnson-liang opened 1 year ago
From 2023/10/11 meeting https://g0v.hackmd.io/t9ypB87SQBuMjjW_PheZVg#Comm-AI-transcript
The current implementation for speech-to-text (based on Whisper API) suffers from hallucination problems. Some of the examples are:
https://cofacts.tw/article/TvR6AosBAjOeMOklfe-g 原來 train data 是來自群眾協作字幕的社群呀 無聲 https://cofacts.tw/article/JvRhAosBAjOeMOklpe-v 我會希望他不要翻譯耶其實 雖然他翻得還 OK https://cofacts.tw/article/FPRXAosBAjOeMOklXO9y 前面好好的 後面沒聲音開始起肖 https://cofacts.tw/article/m_S3AosBAjOeMOkls-_a 慘叫 https://cofacts.tw/article/jvSIBYsBAjOeMOklDvOv 無法解釋 明明有這麼明顯的口白 https://cofacts.tw/article/MvTSCosBAjOeMOklBvlJ
https://cofacts.tw/article/TvR6AosBAjOeMOklfe-g 原來 train data 是來自群眾協作字幕的社群呀
無聲 https://cofacts.tw/article/JvRhAosBAjOeMOklpe-v
我會希望他不要翻譯耶其實 雖然他翻得還 OK https://cofacts.tw/article/FPRXAosBAjOeMOklXO9y
前面好好的 後面沒聲音開始起肖 https://cofacts.tw/article/m_S3AosBAjOeMOkls-_a
慘叫 https://cofacts.tw/article/jvSIBYsBAjOeMOklDvOv
無法解釋 明明有這麼明顯的口白 https://cofacts.tw/article/MvTSCosBAjOeMOklBvlJ
We should investigate:
Precious research on Whisper and mitigation to it's hallucination https://g0v.hackmd.io/wkx286lmTDaFUpgRhnUawQ#Whisper
Tried to remove apparent hallucination in https://github.com/cofacts/rumors-api/pull/323
Update: use multimodal LLM (Gemini) instead, see https://g0v.hackmd.io/wkx286lmTDaFUpgRhnUawQ#Gemini
The current implementation for speech-to-text (based on Whisper API) suffers from hallucination problems. Some of the examples are:
We should investigate:
References
Precious research on Whisper and mitigation to it's hallucination https://g0v.hackmd.io/wkx286lmTDaFUpgRhnUawQ#Whisper