glut23 / webvtt-py

Read, write, convert and segment WebVTT caption files in Python.
MIT License
192 stars 56 forks source link

according to WebVTT specification hours field is optional when hours is zero #2

Closed mzagrabe closed 7 years ago

mzagrabe commented 7 years ago

Greetings,

I just started looking at your module. Thanks for writing free software!

I attempted to parse a vtt file that ffmpeg generated and an exception was raised.

The ffmpeg generated file doesn't have the hours field of the timestamp if the hours is zero.

For instance:

01:11.913 --> 01:13.346

whereas the SRT file does:

00:01:11,913 --> 00:01:13,346

This seems to be within the WebVTT spec. From:

https://www.w3.org/TR/webvtt1/

we see:

""" A WebVTT timestamp consists of the following components, in the given order:

Optionally (required if hours is non-zero): Two or more ASCII digits, representing the hours as a base ten integer. A U+003A COLON character (:) """

Looking at the regex in your code, the hours field (and its separation colon) is required.

Thanks for looking into this.

-m

glut23 commented 7 years ago

Hi @mzagrabe thanks for looking at the code and report the issue with all those details. I've made the changes so now the hours are optional for WebVTT timestamps.

glut23 commented 7 years ago

Hi @mzagrabe 0.3.1 release is now available to install via pip. Thanks!