KBlixt / subcleaner

removes ads from subtitle files cleanly.
288 stars 13 forks source link

SRT file start at zero index #51

Closed hermetic-charm closed 10 months ago

hermetic-charm commented 11 months ago

Hey, first thanks for making/sharing this script! I recently tried it out for the first time, and it worked great, but a few movie SRT files came back with parsing errors.

When I looked into the files it seems like they all have the first block starting with an index of zero while SRT files which were read correctly all start at one.

Error from one of the runs:

INFO: subcleaner finished successfully partly. 31/36 files cleaned successfully.
    INFO: failed to clean following files:
    INFO:   - 'Harry Potter and the Deathly Hallows - Part 1 (2010) WEBDL-1080p.en.srt' reason: subcleaner was unable to decode the file: Parsing error at block 1 in file "\\file\path\redacted\Harry Potter Complete Collection (2001-2011)\Harry Potter and the Deathly Hallows - Part 1 (2010)\Harry Potter and the Deathly Hallows - Part 1 (2010) WEBDL-1080p.en.srt" line None. reason: incorrectly formatted subtitle block

SRT file attached (but saved as txt so it could upload).

My guess is the zero index, since it is the only difference I noticed between a good file vs. bad file. Is this just SRT standard format I guess to start the index at 1? If so, maybe there could be a simple sanity check when you first start reading the file, if the first block is indexed at zero change the index to 1 and re-number all following indexes (similar to re-numbering after a block deletion?). Harry Potter and the Deathly Hallows - Part 2 (2011) WEBDL-1080p.en.srt.txt

KBlixt commented 10 months ago

Yes, it should be able to handle 0 indexed. It should even be able to handle no indexing at all. But I'll take a look into why this is the case and see if I can replicate.

Although the file yoy shared seems to not be 0 indexed? Have you re-indexed it?

KBlixt commented 10 months ago

Are you sure you sent me the right file? The error mentions part 1 but you sent part 2?

Im unable to replicate the error on the subtitle for part 2. Before I'm adding more debug logging I'd like to know if this issue is also present in the part 2 subtitle?

hermetic-charm commented 10 months ago

Sorry about that! Yes, it is only in the part 1 file. Let me upload part 1 now. If you say it should handle zero index, I am guessing there is something else weird going on I am not seeing, and it is just a coincidence they all start with zero.

Harry Potter and the Deathly Hallows - Part 1 (2010) WEBDL-1080p.en.srt.txt

KBlixt commented 10 months ago

The file seems to start with two strange characters '��'

This is most likely an artifact from a badly file re-encoding.

I'll sort them out.

Edit: looks like the new lines are also messed up? Maybe looks good on editors since they can deal with it.

I'll need to examine this on my computer. I'll take a look at it in the evening.

KBlixt commented 10 months ago

This should be fixed now.

the issue was the these files were utf-16 encoded. the script will now read them properly but fyi: I always convert to utf-8 unless nothing is changed in the file.

please confirm fix :)

hermetic-charm commented 10 months ago

It worked perfectly! Thank you for the quick fix. :)