donpark / html2jade

Converts HTML to Jade template. Not perfect but useful enough for non-daily conversions.
MIT License
1.18k stars 157 forks source link

Strip server language tags #103

Closed jonscottclark closed 8 years ago

jonscottclark commented 8 years ago

Hey @donpark,

This is going to make my life a lot easier, thank you for developing this!

I'm converting some HTML that has ASP-style tags throughout: <% ... %>. The script appears to choke completely when encountering this and output stops completely (any preceding code still gets converted). Not a surprise as it's not really within the scope of what Jade's meant to handle.

When trying the same thing with PHP tags (<? or <?php), the content within the tag simply gets stripped. Not sure if it's luck or whether it's part of the library (couldn't find "<?" in the script).

Would it be possible to strip any content in ASP-style tags as well?

Ideally, the content could be preserved so that when compiling from Jade back to HTML, any PHP/ASP could get re-inserted, but I think that would be way too unreliable considering all the possibilities out there for spaghetti code.

For instance, just a simple experiment, to convert this (horrible) PHP (that nobody should ever write) to Jade...

<p>
  <?
  if ($wat) {
    render();
    ?>
    <span>Yeah!</span>
    <a href="sup.html">How are you?</a>
    <?
  }
  ?>
</p>

It would need to come out of html2jade looking like this to convert back to the same HTML:

p.
  <?
    if ($wat) {
    render();
    ?>
    #[span Yeah!]
    #[a(href='sup.html') How are you?]
    <?
  }
  ?>

Looks bad... And while it works, this is just such a trivial example, and it would surely just be too much overhead / out of scope, and also a complete nightmare to develop. So stripping these non-HTML tags seems like it would be a legitimate and expected behaviour. Besides <?, <?php, <%, are there any other tags that could potentially be stripped?

donpark commented 8 years ago

It's happening because html2jade converts HTML at DOM level, meaning it's HTML parser (htmlparser2 via jsdom-little) that html2jade uses that is barfing on ASP and PHP server-side tags.

I think it's best to filter server-side tags before it gets to html2jade because embedding the functionality doesn't offer any value over externally filtering.

jonscottclark commented 8 years ago

Well put, thanks for the explanation. Seems like a simple regular expression, (<%.*?%>) and replacement <!-- $1 --> will wrap any of those tags in comments, which satisfies my requirements.

On the other side of the workflow, I can just strip the comments before Jade compilation. Success :)

Thanks, @donpark