For same video, use traditional chinese get repeated dialogue instead of one single continuous dialogue

0
00:00:00,542 --> 00:00:02,252
植物人？

1
00:00:03,795 --> 00:00:05,088
不會吧

2
00:00:05,547 --> 00:00:07,841
在智比赛打傷了封手。

3
00:00:08,091 --> 00:00:08,633
上面没這磨窝

0
00:00:00,542 --> 00:00:01,668
植物人?

1
00:00:01,710 --> 00:00:01,960
植物人3

2
00:00:02,001 --> 00:00:02,252
植物人?

3
00:00:03,795 --> 00:00:05,088
不會吧

4
00:00:05,547 --> 00:00:07,257
在練習比賽打傷了對手

5
00:00:07,298 --> 00:00:07,298
在練習比產傷

6
00:00:07,382 --> 00:00:07,382
在練習比賽打傷

7
00:00:07,424 --> 00:00:07,424
在練習比產不傷

8
00:00:07,507 --> 00:00:07,841
在練習比賽打傷了對手

9
00:00:08,091 --> 00:00:08,633
上面波這 磨寫

The former I use language code ch but some character are wrongly detected. So I figure I should change to accurate subtitle language code. The latter is same video with correct language code chinese_cht, but timeline mess up. I got repeated dialogues which are supposed to be one single continuous dialogue. Though some characters are now detected correctly, eg. 在智比赛 is now corrected detected as 練習比賽.

Any idea what parameter I should tweak or bc model for traditional chinese has some issue? Thanks! Appreciate your work.

devmaxxing / videocr-PaddleOCR

For same video, use traditional chinese get repeated dialogue instead of one single continuous dialogue #13