alexanderwink / subdl

subdl - command-line tool to download subtitles from opensubtitles.org Official support for subdl seems to have been dropped. This site is intended for the future use of subdl with community driven support.
87 stars 18 forks source link

Support for post-processing (file encoding, file type) #28

Closed mardab closed 2 years ago

mardab commented 3 years ago

I have tried multiple other solutions and this is the best one at this moment. My only gripe is that for some languages default format is most often txt (which some players acknowledge, but never auto-load) with pre-utf8 encoding (which requires separate tool to correct it) and right now, unlike competitors, subdl has no built-in option to correct these problems.

Also, being able to check file encoding could also help with (automatic) subtitle selection, since newer, more likely to be better subtitles don't use anything else than unicode.

If possible, I'd like to contribute at least a preliminary support for post-processing, but before I do that I'd like to know how should I attempt it.

milahu commented 2 years ago

check file encoding

chardet

for some languages default format is most often txt

sounds like there are no timestamps in txt files?

converting txt to srt is non-trivial, this requires speech-recognition

probably low-quality offline speech-recognition (deepspeech or STT) will work as you have both audio and text

similar https://github.com/abhirooptalasila/AutoSub

Support for post-processing

let me add: remove ads from subtitles. for example ...

find . -name '*.srt' -print0 | xargs -0 grep -F -h -e .com -e www. | sort | uniq

contact www.OpenSubtitles.org today
contact www.OpenSubtitles.org today
Downloaded From www.AllSubs.org
Download Movie Subtitles Searcher from www.OpenSubtitles.org
FilthyRichFutures.com
Find out @ saveanilluminati.com
Find out @ saveanilluminati.com
<font color="#00ff00"> www.addic7ed.com</font>
-- <font color="#138CE9">www.addic7ed.com</font> --
-- <font color="#138CE9">www.Addic7ed.com</font> --
... <font color="#138CE9">www.Addic7ed.com</font> ...
<font color="#ffff00" size=14>www.moviesubtitles.org</font>
<font color="#ffff00" size=14>www.opensubtitles.org</font>
<font color="#ffff00" size=14>www.tvsubtitles.net</font>
<font color=green>EMail - parminder222536@hotmail.com
Please rate this subtitle at www.osdb.link/xxxx
Preuzeto sa www.titlovi.com
Subtitle by Luis-Subs From subscene.com
to remove all ads from www.OpenSubtitles.org
to remove all ads from www.OpenSubtitles.org
To visit alt.lawndale.com
Trading can. FilthyRichFutures.com
WhoisPaulHaggis.com.
wodurch sämtliche Werbung von www.OpenSubtitles.org entfernt wird
- www.addic7ed.com -
www.addic7ed.com
-- www.Addic7ed.com --
www.addic7ed.com</font>
www.DeeJayAhmed.com
www. forom. com
www.forom.com
-== [ www.OpenSubtitles.com ] ==-
-== [ www.OpenSubtitles.com ] ==-
-= www.OpenSubtitles.org =-
-== [ www.OpenSubtitles.org ] ==-
-== [ www.OpenSubtitles.org ] ==-
www.OpenSubtitles.org
www.OpenSubtitles.org adresinden tüm reklamları kaldırmak için bizi destekleyin ve VIP üye olun.
www. outpost-daria. com
www.outpost-daria.com
www.RegieLive.ro
www.titlovi.com
www.whoisTomDevocht.com.
K0-RR commented 2 years ago

Without such basic feature as file encoding this tool is useless for any other subtitle language than English... image

milahu commented 2 years ago

@RDKRACZ please test #29