YaoFANGUK / video-subtitle-extractor

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Apache License 2.0
5.8k stars 639 forks source link

Finnish language support #193

Open maxminstr opened 1 year ago

maxminstr commented 1 year ago

Would it be possible to add support for Finnish?

If there is anything I can do to help you accomplish this, please let me know.

YaoFANGUK commented 1 year ago

If you want to add new language, plz train your own langauge recognition mode. Once you get a Finnish model, I can add to this software and make support. Training Turtorial can be found at: https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_en/training_en.md

YaoFANGUK commented 1 year ago

Hi, maxminstr

I just update the source code, and I found that you may try recognising Finnish by select Latin.

maxminstr commented 1 year ago

Thanks for the tip!

  1. With Latin, I was able to get vse to detect Finnish special characters (ä, ö, and å) better than with English, but it still does miss most such characters: it often detects "ä" as "a", for example.

I suppose source video quality might have an impact as I'm testing with a captured VCR recording: I might later test first upscaling the source video.

  1. When using English as detection language, vse properly puts two-row subtitles to two rows also in the resulting srt. However, with other detection languages (such as Latin) two-row subtitles are put on single row. Any way to fix this for Latin for example?

  2. I was previously using pre-built Windows EXE, but now I used the latest source code.

However, when setting up the environment, I got this error:

ERROR: Cannot install -r requirements.txt (line 10), -r requirements.txt (line 13), -r requirements.txt (line 16), -r requirements.txt (line 17), -r requirements.txt (line 9) and numpy==1.20.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested numpy==1.20.0
    imageio 2.13.3 depends on numpy
    imgaug 0.4.0 depends on numpy>=1.15
    matplotlib 3.5.1 depends on numpy>=1.17
    opencv-python 4.5.4.60 depends on numpy>=1.17.3
    paddlepaddle 2.2.2 depends on numpy<=1.19.3 and >=1.13; python_version >= "3.5" and platform_system == "Windows"

I was able to fix this by removing the version requirement from "paddlepaddle".

So far, I didn't notice any adverse effects.