Closed ichernev closed 13 years ago
Hi, thanks for your report. Can you send me an example file breaking the parsing ?
http://iskren.info/share/bar.srt << well here is a link.
This is the important part :)
>>> open('bar.srt').read(3)
'\xef\xbb\xbf'
Thanks a lot, I will submit a fix ASAP
I've just fixed your issue and released a 0.2.6. Let me know if your problem persist.
Thank you!
I just created a very helpful script that manages subtitles using your library :) I think your library is great!
Regards, Iskren
On Thu, Apr 28, 2011 at 8:06 PM, byroot reply@reply.github.com wrote:
Thanks a lot, I will submit a fix ASAP
Reply to this email directly or view it on GitHub: https://github.com/byroot/pysrt/issues/6#comment_1071217
I just want to let you know, that different BOMs have different length:
>>> codecs.BOM_UTF8
'\xef\xbb\xbf'
>>> codecs.BOM_UTF16
'\xff\xfe'
>>> codecs.BOM_UTF16_BE
'\xfe\xff'
>>> codecs.BOM_UTF16_LE
'\xff\xfe'
>>> codecs.BOM_UTF32_BE
'\x00\x00\xfe\xff'
>>> codecs.BOM_UTF32_LE
'\xff\xfe\x00\x00'
So just reading 3 bytes and hoping that all BOMs are 3 bytes long won't work for other encodings (except utf8).
Regards, Iskren
On Thu, Apr 28, 2011 at 8:56 PM, byroot reply@reply.github.com wrote:
I've just fixed your issue and released a 0.2.6. Let me know if your problem persist.
Reply to this email directly or view it on GitHub: https://github.com/byroot/pysrt/issues/6#comment_1071513
You're right, stupid me !
I've changed the algoritm: https://github.com/byroot/pysrt/commit/7c591bec3e6f7b37e0233cb1d63424d32f96fea5
Let me know what do you think of it.
That one looks right, good job!
On Thu, Apr 28, 2011 at 11:00 PM, byroot reply@reply.github.com wrote:
You're right, stupid me !
I've changed the algoritm: https://github.com/byroot/pysrt/commit/7c591bec3e6f7b37e0233cb1d63424d32f96fea5
Let me know what do you think of it.
Reply to this email directly or view it on GitHub: https://github.com/byroot/pysrt/issues/6#comment_1072269
Le 28 avr. 2011 à 23:11, ichernev a écrit :
That one looks right, good job!
Ok, I'm writing tests on this behavior and then I'll release again ...
Thank for having noticed that bug.
0.2.7 released.
If the srt file starts with a BOM ('\xef\xbb\xbf') it fails the subtitle parse, so the first subtitle is missing.
Maybe a manual test after open to check for these bytes, or a library to handle it automatically?