KBlixt / subcleaner

removes ads from subtitle files cleanly.
284 stars 12 forks source link

Invalid Start Byte #9

Closed televisi closed 2 years ago

televisi commented 2 years ago

Hi,

I'm trying to run the script against one of the srt files, unfortunately it thrown the following error:

root@bazaar:/movies/abc# python3 /config/script/subcleaner/subcleaner.py --dry-run File.en.srt 
subcleaner was unable to decode file: "Files.en.srt
" reason: "invalid start byte"
subcleaner completed successfully.

I have attached the srt file for your reference.

Thanks Files.en.srt.zip

KBlixt commented 2 years ago

hm, strange. you provide it with the argument "File.en.srt" but it tries to read "Files.en.srt"? this shouldn't be possible. Will investigate.

Also I downloaded the file and #ItWorksOnMyMachine. Could you try to copy the content to another file and see if the issue persist?

televisi commented 2 years ago

Sorry, it was a silly rename issue; originally it has a movie name; but I renamed it to "Files.en.srt" for obvious reason (don't want to share the full movie names here hehe.

I did try to rename the file, unfortunately, same error.

I have attached the original srt files + bash script history here. Script and srt files.zip

KBlixt commented 2 years ago

ok, I see, no problem. It looks like it has something to do with the encoding of the file. I will try to investigate it. have other files worked?

KBlixt commented 2 years ago

If the file have been encoded incorrectly and the uploaded to the site then scripts that are a bit less robustly such as this one could have issues reading the files. try to create a new file and copy the text from the old file into the new file. sometimes this helps.

KBlixt commented 2 years ago

Yep, looking at the file more carefully it has been encoding using "Windows-1252" instead of something more standard like UTF-8 or ASCII.

This is why subcleaner fail to open the file. I've added to the script that it tries to open with that encoding as well before giving up. so If you update the script it should work.

televisi commented 2 years ago

Thank you, this fixed my issue