Ayanaminn / N46Whisper

Whisper based Japanese subtitle generator
MIT License
1.59k stars 133 forks source link

英语分行标记存在问题 #46

Open SymbolicRudolf opened 1 year ago

SymbolicRudolf commented 1 year ago

选择语言为en时不能以英文据点为分行标记分割,包括is_spilt和spilt_method等设置都不能改变输出ass文件的分行;相对而言日语的分行就很明确,本问题自4.15日更新后出现,请问是否为faster-whisper的问题?

Ayanaminn commented 1 year ago

希望能提供再详细些的信息,比如何种情况下比起以前来该分割的没分割?能提供测试文件最好。因为我自己本身用不到英文转录,开发测试都是基于日语音频。

另外split这个功能本身也是针对日语才加的,对英文不起效。

SymbolicRudolf commented 1 year ago

如这样一句话: De Zerby's start at Brighton has been nothing short of spectacular, continuing the Seagulls transformation into one of the most aggressive and entertaining teams in the league, all with a simple principle at its core – possession football all over the pitch., 但在whisper生成的文件里,则是按照固定长度分开来的De Zerby's start at Brighton has been nothing short of spectacular, continuing the Seagulls / transformation into one of the most aggressive and entertaining teams in the league, all / with a simple principle at its core – possession football all over the pitch. It's a principle 这三段,也就是说whisper在识别英文时把本来按照语法连接成的1句分割成了按照固定长度短句的3行。

Ayanaminn commented 1 year ago

有可能是和beam size这个参数有关, 在换成faster-whisper之后我把这个参数定explicitly固定成5了。以前的话是默认的None.

The default beam size is 5 when using the whisper command line, but not when calling the model.transcribe method. Here the beam size defaults to None which means that greedy decoding is used.

SymbolicRudolf commented 1 year ago

感谢,已解决

NNGTHB commented 1 year ago

感谢,已解决

请问一下怎么解决的呢,能说一下步骤吗?