allcolor / YaHP-Converter

YaHP is a Java library that allows you to convert an HTML document into a PDF document.
GNU Lesser General Public License v2.1
56 stars 23 forks source link

HTML tags inside scripts cause breakage #5

Open seanlavelle opened 11 years ago

seanlavelle commented 11 years ago

If you try to use yahp on an html document that looks like this:

<html>
<script>
var foo = "</foo>"
</script>
</html>

then it fails with a null pointer exception. It's because jtidy fails on those documents and yahp tries to use the original html, but its tidiness assumptions get violated. See this jtidy bug report: http://sourceforge.net/p/jtidy/discussion/41437/thread/408cffe8/

Right now yahp is stripping out <script> elements, but it does it after passing the html through jtidy. I think a reasonable workaround would be to strip out the scripts before calling jtidy, so jtidy won't fail. I am working on coding this fix.

Is this project still active enough for a pull request that fixes this to get merged?