Diaoul / subliminal

Subtitles, faster than your thoughts
http://subliminal.readthedocs.org
MIT License
2.4k stars 311 forks source link

Add support for WebVTT and MicroDVD #462

Open Diaoul opened 9 years ago

Diaoul commented 9 years ago

Will require to switch to pycaption for validation Not compatible with python 3, abandoned project?

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Diaoul commented 8 years ago

@Toilal @wackou: I want to create my own robust subtitle parser and will likely create a new library for that that handles various formats. I'm looking for the right tool for the job, all subtitles formats seem to have a defined grammar that makes parsing easily possible. There are various technologies for that (PEG parsers, lexers such as LEX or YACC) and so on. Would you recommend one for that kind of work?

I saw various tools such as pyparsing, PLY, pyPEG and parsimonious. I wonder if rebulk would be able to do that? There's no decision making so I think it's not the right tool. There is also the possibility to have my own basic parser based on str and re.

Ideas are welcome :fish_cake:

Toilal commented 8 years ago

Do you have examples and/or specs for those formats ?

Rebulk can be used for "short input" and "pseudo-natural" language. I don't think it's the write tool to parse a structured file. It's designed to define patterns (string, regex or functional) than will be scanned in the whole input string, retrieve consistent match objects from those different type of patterns, and filter out false positives with rules implying relations between those matches.

I've never used mentioned parsers in python sorry :)

Diaoul commented 8 years ago

You can find examples here: