YaoFANGUK / video-subtitle-extractor

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Apache License 2.0
5.62k stars 623 forks source link

ES support #58

Open latot opened 2 years ago

latot commented 2 years ago

Hi hi, it is possible to have spanish support?

Thx.

YaoFANGUK commented 2 years ago

For program interface, it’s possible to support Spanish while to support recognising Spanish subtitles it not easy to implement.

latot commented 2 years ago

what would be neccessary?, in fact, what would be neccessary to can implement multi-lang subs recognition?

YaoFANGUK commented 2 years ago

now Spanish, Portuguese and Russian sub recognition are supported, please use the latest code

latot commented 2 years ago

Hi Thx!, to finish, a question, actually, how the program detect every lang?, AI? and how detect differences like different styles?

YaoFANGUK commented 2 years ago

The backend uses a deep learning approach (as a case in point, CTPN or DBNet etc.) to detect text area, and then boudning boxes are calculated and fed into a CRNN followed by the output of recognition result.

latot commented 2 years ago

What data are you using to fed them? and when is trained? or the CRNN model is from other source?, actually I'm very interested in this project, I have a lot of projects to get subs, but instead hand writing, I prefer work in a project like this.

YaoFANGUK commented 2 years ago

The cropped images which contain text are fed into CRNN. The CRNN model I use is pretrained from PaddleOCR project which is open-sourced and you may find it on github. To improve the accuracy of the model, you can use your own data to refine.

latot commented 2 years ago

mm, I was thinking, instead process all the image, an option to say "the letters have this colors", so is possible filter the image before the model to improve the detection, obvs, would be neccessary confirm this improve it.

YaoFANGUK commented 2 years ago

The aim of this project is to leave few options for users as least as possible so that users with no professional experience can easily use this tool. My wish is that the programm can automatically make optimal choice for different input behind the scenes.

latot commented 2 years ago

:O that is a big challenge..., mmm.

There is some points, in my case, I worked writing subs, manually, it takes a lot of time, but I worked fixing subs, this is weird, but fixing subs can be more time consuming than doing from 0, that is why, this type of project, need to be enough good to save time.

In any case, any thing that the program is unable to do, we will need to do it manually, I think is possible improve this, and any change will usually mean more time consuming for the PC, in any case, while the app don't eat the whole computer, we can always keep the pc working on it.

To reach that..., there is some ways to improve this...

Some ideas:

Filter by color: I don't know if you think this is too technical or not

Get an image->crop text->from the original image, that parts that does not have text, use it an split in sub images, lets think we will get 25 images (5x5), in every image will be draw characters and phrases, after do this with a lot of images from the video, train a new model and then, use it to get the text from the original text images. Is usual that videos have different ways to draw or represent the things, so, maybe this can get a more accurate way to get the texts.

There is more ways, but all need to be tested, the second method, I think can help for example, if the first one is not implemented.