glut23 / webvtt-py

Read, write, convert and segment WebVTT caption files in Python.
MIT License
194 stars 56 forks source link

webvtt.read_buffer doesnt work after upgrading to 0.4.4 #29

Closed theaiinstitute closed 4 years ago

theaiinstitute commented 4 years ago

Hi, Here's the full script i've ran with webvtt-py version 0.4.4

from io import StringIO
import urllib.request
import webvtt

url = 'https://course-recording-q1-2020-taii.s3.eu-west-3.amazonaws.com/us/GMT20200117-205611_AI-Inst--U.transcript.vtt'
response = urllib.request.urlopen(url)
data = response.read() 
text = data.decode('utf-8')
buffer = StringIO(text)

for l in webvtt.read_buffer(buffer):
    print(l.text)

this script shows nothing, but when i print the variable text, it actually shows a lot of content. I think there's some problem with the function read_buffer in version 0.4.4. That is because when I just downgraded the version to 0.4.3 then everything worked fine. Please review this!

glut23 commented 4 years ago

Hi @theaiinstitute I released 0.4.5 with a fix. Please confirm this resolves the issue. Thanks!

theaiinstitute commented 4 years ago

It works! Thank for your quick reaction, appreciate that!

igifar commented 2 years ago

Unfortunately, there is a problem for me too. Three years later, the problem still exists

`url = 'https://hls.ted.com/project_masters/7970/subtitles/ja/full.vtt' result = ''

response = urllib.request.urlopen(url)
data = response.read()
text = data.decode('utf-8')
buffer = StringIO(text)

for l in webvtt.read_buffer(buffer):
    print(l.text)`

File "C:\Users\FARSHAD\AppData\Local\Programs\Python\Python310\lib\site-packages\webvtt\webvtt.py", line 68, in read_buffer parser = WebVTTParser().read_from_buffer(buffer) File "C:\Users\FARSHAD\AppData\Local\Programs\Python\Python310\lib\site-packages\webvtt\parsers.py", line 33, in read_from_buffer self._parse(content) File "C:\Users\FARSHAD\AppData\Local\Programs\Python\Python310\lib\site-packages\webvtt\parsers.py", line 214, in _parse self._parse_blocks(blocks) File "C:\Users\FARSHAD\AppData\Local\Programs\Python\Python310\lib\site-packages\webvtt\parsers.py", line 250, in _parse_blocks raise MalformedCaptionError( webvtt.errors.MalformedCaptionError: Standalone cue identifier in line 128.