英语分行标记存在问题

Ayanaminn / N46Whisper

Whisper based Japanese subtitle generator

MIT License

1.59k stars 133 forks source link

英语分行标记存在问题 #46

Open SymbolicRudolf opened 1 year ago

SymbolicRudolf commented 1 year ago

选择语言为en时不能以英文据点为分行标记分割，包括is_spilt和spilt_method等设置都不能改变输出ass文件的分行；相对而言日语的分行就很明确，本问题自4.15日更新后出现，请问是否为faster-whisper的问题？

Ayanaminn commented 1 year ago

希望能提供再详细些的信息，比如何种情况下比起以前来该分割的没分割？能提供测试文件最好。因为我自己本身用不到英文转录，开发测试都是基于日语音频。

另外split这个功能本身也是针对日语才加的，对英文不起效。

SymbolicRudolf commented 1 year ago

如这样一句话： De Zerby's start at Brighton has been nothing short of spectacular, continuing the Seagulls transformation into one of the most aggressive and entertaining teams in the league, all with a simple principle at its core – possession football all over the pitch.，但在whisper生成的文件里，则是按照固定长度分开来的De Zerby's start at Brighton has been nothing short of spectacular, continuing the Seagulls / transformation into one of the most aggressive and entertaining teams in the league, all / with a simple principle at its core – possession football all over the pitch. It's a principle 这三段，也就是说whisper在识别英文时把本来按照语法连接成的1句分割成了按照固定长度短句的3行。

Ayanaminn commented 1 year ago

有可能是和beam size这个参数有关, 在换成faster-whisper之后我把这个参数定explicitly固定成5了。以前的话是默认的None.

The default beam size is 5 when using the whisper command line, but not when calling the model.transcribe method. Here the beam size defaults to None which means that greedy decoding is used.

SymbolicRudolf commented 1 year ago

感谢，已解决

NNGTHB commented 1 year ago

感谢，已解决

请问一下怎么解决的呢，能说一下步骤吗？