Open read8873 opened 3 weeks ago
When there are both voice and music, whisper tends to output "music" instead of the text of voice. Consider support add voice separation
When there are both voice and music, whisper tends to output "music" instead of the text of voice. Consider support add voice separation