glut23 / webvtt-py

Read, write, convert and segment WebVTT caption files in Python.
MIT License
188 stars 56 forks source link

MalformedCaptionError #33

Open BeAtS85 opened 4 years ago

BeAtS85 commented 4 years ago

Sometimes there are empty timestamps in the .vtt. The script errors out on them.

For example: 00:22:21.320 --> 00:22:26.520 00:21:13.720 --> 00:21:15.360 line:90% position:50% align:middle

Can this error somehow be captured or ignore the empty timestamps?

chapmanjacobd commented 2 years ago

yeah it would be nice if the file was parsed a line at a time so people could do something like this:

try:
    out = []
    for caption in webvtt.read_generator(path):
        try:
            line = caption.text   # or even .text() would be fine
        except webvtt.MalformedCaptionError:
            pass
        else:
            out.append(remove_text_inside_brackets(line.replace("\n", " ")))

    return out
except webvtt.MalformedFileError:
    return []

tmpsqszy7z2.vtt.txt

~80% of my VTT files are malformed according to this library so as-is not super useful for my use case.... :/