glut23 / webvtt-py

Read, write, convert and segment WebVTT caption files in Python.
MIT License
188 stars 56 forks source link

Add support for WebVTT timeframes in MS Teams' non-compliant format #45

Open apetresc opened 1 year ago

apetresc commented 1 year ago

Microsoft Teams generates .VTT transcript files for all recorded meetings. Unfortunately, it appears those files are not spec-conforming (shocker!), because they don't 0-pad any of the fields to their respective sizes (3 for milliseconds, 2 for everything else).

Pragmatically speaking, it would help for webvtt-py to support this, since it doesn't harm its ability to correctly and safely parse conforming ones. This patch does that, and adds a testcase for a representative Teams VTT.

Closes #44 among many others, I'm sure.