BOM markers are not handled properly

byroot / pysrt

Python parser for SubRip (srt) files

GNU General Public License v3.0

449 stars 67 forks source link

BOM markers are not handled properly #6

Closed ichernev closed 13 years ago

ichernev commented 13 years ago

If the srt file starts with a BOM ('\xef\xbb\xbf') it fails the subtitle parse, so the first subtitle is missing.

Maybe a manual test after open to check for these bytes, or a library to handle it automatically?

byroot commented 13 years ago

Hi, thanks for your report. Can you send me an example file breaking the parsing ?

ichernev commented 13 years ago

http://iskren.info/share/bar.srt << well here is a link.

This is the important part :)

>>> open('bar.srt').read(3)
'\xef\xbb\xbf'

byroot commented 13 years ago

Thanks a lot, I will submit a fix ASAP

byroot commented 13 years ago

I've just fixed your issue and released a 0.2.6. Let me know if your problem persist.

ichernev commented 13 years ago

Thank you!

I just created a very helpful script that manages subtitles using your library :) I think your library is great!

Regards, Iskren

On Thu, Apr 28, 2011 at 8:06 PM, byroot reply@reply.github.com wrote:

Thanks a lot, I will submit a fix ASAP

Reply to this email directly or view it on GitHub: https://github.com/byroot/pysrt/issues/6#comment_1071217

ichernev commented 13 years ago

I just want to let you know, that different BOMs have different length:

>>> codecs.BOM_UTF8
'\xef\xbb\xbf'
>>> codecs.BOM_UTF16
'\xff\xfe'
>>> codecs.BOM_UTF16_BE
'\xfe\xff'
>>> codecs.BOM_UTF16_LE
'\xff\xfe'
>>> codecs.BOM_UTF32_BE
'\x00\x00\xfe\xff'
>>> codecs.BOM_UTF32_LE
'\xff\xfe\x00\x00'

So just reading 3 bytes and hoping that all BOMs are 3 bytes long won't work for other encodings (except utf8).

Regards, Iskren

On Thu, Apr 28, 2011 at 8:56 PM, byroot reply@reply.github.com wrote:

I've just fixed your issue and released a 0.2.6. Let me know if your problem persist.

Reply to this email directly or view it on GitHub: https://github.com/byroot/pysrt/issues/6#comment_1071513

byroot commented 13 years ago

You're right, stupid me !

I've changed the algoritm: https://github.com/byroot/pysrt/commit/7c591bec3e6f7b37e0233cb1d63424d32f96fea5

Let me know what do you think of it.

ichernev commented 13 years ago

That one looks right, good job!

On Thu, Apr 28, 2011 at 11:00 PM, byroot reply@reply.github.com wrote:

You're right, stupid me !

I've changed the algoritm: https://github.com/byroot/pysrt/commit/7c591bec3e6f7b37e0233cb1d63424d32f96fea5

Let me know what do you think of it.

Reply to this email directly or view it on GitHub: https://github.com/byroot/pysrt/issues/6#comment_1072269

byroot commented 13 years ago

Le 28 avr. 2011 à 23:11, ichernev a écrit :

That one looks right, good job!

Ok, I'm writing tests on this behavior and then I'll release again ...

Thank for having noticed that bug.

byroot commented 13 years ago

0.2.7 released.