clsid2 / mpc-hc

Media Player Classic
GNU General Public License v3.0
11.14k stars 492 forks source link

libass for SRT failure when file is not UTF8 #2299

Closed clsid2 closed 11 months ago

clsid2 commented 11 months ago

@adipose Problem is with ConvertCPToUTF8

libass_failing_sample.zip

I have limited use of libass to UTF8 in my last commit. So undo that check during testing.

adipose commented 11 months ago

It's failing the valid utf8 check.

You can check it with isutf8 command line tool.

isutf8 libass_failing_sample.srt
libass_failing_sample.srt: line 3, char 24, byte 57: After a first byte between C2 and DF, expecting a 2nd byte between 80 and BF
adipose commented 11 months ago

libass_failing_sample_utf8bom_bomremoved.srt.txt

If I resave the one with BOM without a BOM, it becomes valid, but it's nothing like the binary of the utf8 one.

adipose commented 11 months ago

I see the problem now.

adipose commented 11 months ago

https://github.com/clsid2/mpc-hc/pull/2303

It was using the charset instead of the codepage.

Incidentally...HANGEUL_CHARSET and HANGUL_CHARSET are the same charset. Not sure which is preferred but it's unneeded. I also wonder about OEM, SYMBOL and MAC. These three have no obvious codepage to convert to, but I doubt they are useful anyway.

There are many, many codepages out there. I doubt it, but I wonder if anyone needs a codepage not aligning to these charsets.

clsid2 commented 11 months ago

Yeah, I think those ones are not really needed or used.