Closed lamnatos closed 3 years ago
Hi, I'm not really surprised, this part is brand new code for CLI downloads, I've seen encoding problems while trying it. However I though it was ok now. Is it happening on all of your subtitles or just the one?
Around line 805 you can try to add the print() so you can tell what encoding is used by the operation that failed (at line 855) Also you can tell me what language you are using, maybe I'll be able to reproduce the issue and understand it better!
subEncoding = subtitlesResultList['data'][subIndex]['SubEncoding']
print("SUB ENCODING : " + subEncoding)
So I tried what you said and it does not crash on every movie. Something about that specific one, makes it crash. It worked without crashing in two other movies I tried.
I added the print()
as you suggested, it printed SUB ENCODING : UTF-8
.
I've set the languages to fetch as opt_languages = ['eng', 'ell']
and it works with other movies. In the movie causing the crash it won't work even if I only use eng
.
The movie causing the crash is this one as found on OpenSubtitles.com: https://www.opensubtitles.com/en/movies/2014-eden-193174. Maybe there's something funny with the subtitles for this specific one?
I've added one more print()
to get a few more details.
hashFile(path)
returns:
returnedhash : d00274715dadded6
The actual encoding set when uploading the subtitles probably doesn't match the real encoding of the file.
Can you try changing the line 855 to
decodedStr = str(decompressed, subEncoding, 'replace')
or even a variation like
decodedStr = str(decompressed)
decodedStr = str(decompressed, 'CP1253')
Both of the following work:
decodedStr = str(decompressed, subEncoding, 'replace')
decodedStr = str(decompressed)
Is there an advantage to either of them? I'm inclined to keep the second because it's simpler.
I believe the first one is better. It will respect the encoding provided, and if a convertion error happen is will not fail but replace the unknown character by something close (again, I think, I'm not a python internals expert).
When you say it works, did you just tested the download, or the actual subtitles (text and characters format) are OK? Thanks for your help with this issue, I'll commit the fix and we'll see if nothing else happen with that code ^^
I've settled with this one
decodedStr = str(decompressed, subEncoding, 'replace')
I think the simplest version (without the replace) mangled the line endings.
I've opened the srt and it looks fine. Unfortunately this specific movie doesn't have subtitles in Greek to test if it also works there too. English looks fine though.
Ok good to know. I'll let this issue open for some time if someone else wants to comment, and if you find any problem with Greek characters let me know!
The following is with v5.0 on Raspbian with Python 3.7.3
I recently upgraded to the latest OpenSubtitlesDownload version and I get the following error:
Any idea what might be wrong? I've tried toggling the new
opt_force_utf8
flag but it didn't help.Is there something else I can paste here to help with finding out the problem?