UnicodeDecodeError on accented characters

antiboredom / videogrep

automatic video supercuts with python

Other

3.33k stars 257 forks source link

I am attempting to videogrep a video that is English language but brief lines in Spanish occasionally appear. It looks like subtitles that have some non-English characters cause a unicode decode error to be thrown:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 19669: invalid continuation byte

This can easily be fixed by finding and replacing accented characters with non-accented characters in the subtitle track, but maybe this can be done programmatically without altering the original subtitle file? I'm not sure how common it is to find English language subtitles with correct non-English accent markings, etc.

antiboredom / videogrep

UnicodeDecodeError on accented characters #61