byroot / pysrt

Python parser for SubRip (srt) files
GNU General Public License v3.0
451 stars 69 forks source link

Script: parsing transcript .srt files into readable text #76

Open Jamoverjelly opened 6 years ago

Jamoverjelly commented 6 years ago

Hello,

I am working through an online class and trying to produce notes based on the instructional video content. Since many of the concepts covered in these videos are worth taking note of, I'm finding myself writing out nearly every line spoken by the instructor. Obviously, this process is laborious and extremely time-consuming. I am wondering if there is an easier way to extract the text from these videos using an srt tool to help parse and modify the text.

The syntax of the transcript files for each video are identical to standard srt format. Here's an example:

1
00:00:00,710 --> 00:00:03,220
Rob just showed us how we can
make things accessible to

2
00:00:03,220 --> 00:00:05,970
anyone who can't use a mouse or
pointing device.

3
00:00:05,970 --> 00:00:09,130
Whether that's because it's any
type of physical impairment or

4
00:00:09,130 --> 00:00:11,510
a technology issue or
simply personal preference.

Does pysrt currently provide any tools for modifying text content so that it's formatted into a more readable format? To clarify, for the above example, I would like to remove blank lines, lines beginning with the record number and time-stamp, and then join the remaining lines, adding spaces after periods, like so:

Rob just showed us how we can make things accessible to anyone who can't use a mouse or pointing device. Whether that's because it's any type of physical impairment or a technology issue or simply personal preference.

I am interested in creating the following output from the example above and being able to apply such a modification to more of the files in the series. In my current situation, I am really pretty rusty working with python, though believe this capability could be pretty easily implemented with an understanding of common string methods.

Can anyone contributing to this project let me know how this is done or if the functionality already exists in pysrt?

Thanks!

whoizit commented 5 years ago

@Jamoverjelly https://gist.github.com/whoizit/c54f916c1c6d78ad5ac88cf4735c9d7d