Open Diaoul opened 8 years ago
I confirm that as of today, with version 1.0.1 all Turkish subtitles are encoded wrong. I can't correctly see ğ,ş and ı characters. When I open the subtitle with sublime text 2 or textedit it shows up wrong as well. May I ask if it may have something to do with my locale settings? My "locale -a" output is like this:
C
C.UTF-8
en_US.utf8
POSIX
tr_TR
tr_TR.iso88599
tr_TR.utf8
turkish
and "locale" output is like this:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="tr_TR.UTF-8"
LC_NUMERIC="tr_TR.UTF-8"
LC_TIME="tr_TR.UTF-8"
LC_COLLATE="tr_TR.UTF-8"
LC_MONETARY="tr_TR.UTF-8"
LC_MESSAGES="tr_TR.UTF-8"
LC_PAPER="tr_TR.UTF-8"
LC_NAME="tr_TR.UTF-8"
LC_ADDRESS="tr_TR.UTF-8"
LC_TELEPHONE="tr_TR.UTF-8"
LC_MEASUREMENT="tr_TR.UTF-8"
LC_IDENTIFICATION="tr_TR.UTF-8"
LC_ALL=tr_TR.UTF-8
Thanks for bringing this up. I'm a coder, but not python. How can I help?
Please provide links to subtitles and give their correct encoding.
Hi http://dl.opensubtitles.org/en/download/sub/6367306 This should be windows-1252
I've added correct Windows-1250 and 1251 detection to Sub-Zero, as well as support for other formats than SRT, normalizing to SRT etc.
@Diaoul perhaps you want to take a look at this.
Thanks I will grab that. I'm working on a feature that will abstract subtitle file format and will be able to convert it to various formats. I will push as soon as I have something viable.
For proper encoding detection a rock-solid test suite is mandatory. This issue aims to gather real world test cases for every languages. Please provide links to subtitles and give their correct encoding.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.