evilstreak / markdown-js

A Markdown parser for javascript
7.69k stars 863 forks source link

Help with Markdown-js dialect for srt subtitles implementation #143

Closed aleray closed 10 years ago

aleray commented 10 years ago

I wish there was an other space to write this because is is not an issue, but rather a call for help with writing my own dialect.

I switched to markdown-js from marked.js (interesting one too but not really fitting my purpose) and I really like so far the way one can extend it. I have already ported some extensions written earlier in Python-Markdown for semantic-data quite easily. There is one piece I'm missing though. I would like to port an extension to convert srt-like syntax to HTML.

Basically I would like to convert something like:

00:00:00 --> 00:00:05

some subtitles

00:00:05 --> 00:00:10

some more subtitles

to:

<section data-begin="00:00:00" data-end="00:00:05">
  <p>some subtitles</p>
</section>

<section data-begin="00:00:05" data-end="00:00:10">
  <p>some more subtitles</p>
</section>

That is, when a time code block is encountered, to enclose the consecutive blocks in a <section> tag (with the "data-" arguments set to the time values) until the end of the document is reached or an other timed section is found.

I have the following code at the moment in my dialect (with an over-complex regex at the moment):

Aa.block['timecode'] =  function timecode( block, next ) {
  var ret = [],
    re = /^\s{0,3}(((\d{1,2})(:))?(\d\d):(\d\d)([,\.](\d{1,3}))?)\s*-->(\s*(((\d{1,2})(:))?(\d\d):(\d\d)([,\.](\d{1,3}))?))?\s*(?:\n|$)/,
    m = block.match( re );

  if ( !m )
    return undefined;

  return [ [ "timecode", {"data-begin": m[1], "data-end": m[10]} ] ];
};

which produces:

<section data-begin="00:00:00" data-end="00:00:05"></section>

<p>some subtitles</p>

<section data-begin="00:00:05" data-end="00:00:10"></section>

<p>some more subtitles</p>

Any idea? Thanks!

aleray commented 10 years ago

Ok I found my way, see https://github.com/aleray/markdown-js/tree/aa

Here is the relevant section:

  Aa.block['timecode'] =  function timecode( block, next ) {
    var re = /^\s{0,3}(((\d{1,2})(:))?(\d\d):(\d\d)([,\.](\d{1,3}))?)\s*-->(\s*(((\d{1,2})(:))?(\d\d):(\d\d)([,\.](\d{1,3}))?))?\s*(?:\n|$)/,
      m = block.match( re );

    if ( !m )
      return undefined;

    var inner = [];
    while (next.length) {
      var found = next[0].match(re);

      if ( found ) { break; }

      inner.push(next.shift());
    }

    var begin = [ "span", {"property": "aa:begin"}, m[1] ];
    var end = [ "span", {"property": "aa:end"}, m[10] ];

    return [ [ "section", {"typeof": "aa:annotation", "data-begin": m[1], "data-end": m[10]}, begin, " \u2192 ", end, this.toTree(inner, [ "div", {"property": "aa:content"} ]) ] ];
  };