emericg / OpenSubtitlesDownload

Automatically find and download the right subtitles for your favorite videos!
https://emeric.io/OpenSubtitlesDownload
GNU General Public License v3.0
579 stars 63 forks source link

v5.0: Unexpected error (line 855): <class 'UnicodeDecodeError'> #73

Closed lamnatos closed 3 years ago

lamnatos commented 3 years ago

The following is with v5.0 on Raspbian with Python 3.7.3

I recently upgraded to the latest OpenSubtitlesDownload version and I get the following error:

$ OpenSubtitlesDownload.py .
Unknown GUI, falling back to an automatic CLI mode
>> Downloading 'English' subtitles for 'Eden'
Unexpected error (line 855): <class 'UnicodeDecodeError'>

Any idea what might be wrong? I've tried toggling the new opt_force_utf8 flag but it didn't help.

Is there something else I can paste here to help with finding out the problem?

emericg commented 3 years ago

Hi, I'm not really surprised, this part is brand new code for CLI downloads, I've seen encoding problems while trying it. However I though it was ok now. Is it happening on all of your subtitles or just the one?

Around line 805 you can try to add the print() so you can tell what encoding is used by the operation that failed (at line 855) Also you can tell me what language you are using, maybe I'll be able to reproduce the issue and understand it better!

subEncoding = subtitlesResultList['data'][subIndex]['SubEncoding']
print("SUB ENCODING : " + subEncoding)
lamnatos commented 3 years ago

So I tried what you said and it does not crash on every movie. Something about that specific one, makes it crash. It worked without crashing in two other movies I tried.

I added the print() as you suggested, it printed SUB ENCODING : UTF-8.

I've set the languages to fetch as opt_languages = ['eng', 'ell'] and it works with other movies. In the movie causing the crash it won't work even if I only use eng.

The movie causing the crash is this one as found on OpenSubtitles.com: https://www.opensubtitles.com/en/movies/2014-eden-193174. Maybe there's something funny with the subtitles for this specific one?

I've added one more print() to get a few more details.

hashFile(path) returns:

returnedhash : d00274715dadded6
emericg commented 3 years ago

The actual encoding set when uploading the subtitles probably doesn't match the real encoding of the file.

Can you try changing the line 855 to decodedStr = str(decompressed, subEncoding, 'replace')

or even a variation like decodedStr = str(decompressed)

decodedStr = str(decompressed, 'CP1253')

lamnatos commented 3 years ago

Both of the following work:

decodedStr = str(decompressed, subEncoding, 'replace')

decodedStr = str(decompressed)

Is there an advantage to either of them? I'm inclined to keep the second because it's simpler.

emericg commented 3 years ago

I believe the first one is better. It will respect the encoding provided, and if a convertion error happen is will not fail but replace the unknown character by something close (again, I think, I'm not a python internals expert).

When you say it works, did you just tested the download, or the actual subtitles (text and characters format) are OK? Thanks for your help with this issue, I'll commit the fix and we'll see if nothing else happen with that code ^^

lamnatos commented 3 years ago

I've settled with this one

decodedStr = str(decompressed, subEncoding, 'replace')

I think the simplest version (without the replace) mangled the line endings.

I've opened the srt and it looks fine. Unfortunately this specific movie doesn't have subtitles in Greek to test if it also works there too. English looks fine though.

emericg commented 3 years ago

Ok good to know. I'll let this issue open for some time if someone else wants to comment, and if you find any problem with Greek characters let me know!