glut23 / webvtt-py

Read, write, convert and segment WebVTT caption files in Python.
MIT License
192 stars 56 forks source link

Caption.text showing timestamps and the cue #4

Closed amrutauk closed 7 years ago

amrutauk commented 7 years ago

Hi @glut23 , I just had a question . I am trying to read a vtt file i downloaded from youtube using youtube-dl :

code : for caption in webvtt: if 'cars' in caption.text: print(caption.text)

Output :

mileage<00:10:43.230> cars<c.colorE5E5E5><00:10:43.350> obviously<00:10:44.130> don't<00:10:44.310> stay<00:10:44.430> low

The timestamps and the cue are also getting printed instead of just text. Am I missing something in my code ? Would really appreciate your help .

Thanks in advance !

glut23 commented 7 years ago

Hi @amrutauk what I see in the output you are providing is that the cue text contains what is called cue tags. The ones with timestamps allows a player to display captions like karaoke and the c tag is to style captions with CSS. At the moment only the raw text content is returned unfortunately. I will work on this and provide an update soon to be able to access the cue text without any cue tag. Thanks for reporting the issue! Regards.

glut23 commented 7 years ago

Hi @amrutauk just released 0.3.3 and now you can get the cue text without tags. Thanks.

amrutauk commented 7 years ago

This helps a lot ! Thank you so much for resolving this promptly !