If you try to use yahp on an html document that looks like this:
<html>
<script>
var foo = "</foo>"
</script>
</html>
then it fails with a null pointer exception. It's because jtidy fails on those documents and yahp tries to use the original html, but its tidiness assumptions get violated.
See this jtidy bug report: http://sourceforge.net/p/jtidy/discussion/41437/thread/408cffe8/
Right now yahp is stripping out <script> elements, but it does it after passing the html through jtidy. I think a reasonable workaround would be to strip out the scripts before calling jtidy, so jtidy won't fail. I am working on coding this fix.
Is this project still active enough for a pull request that fixes this to get merged?
If you try to use yahp on an html document that looks like this:
then it fails with a null pointer exception. It's because jtidy fails on those documents and yahp tries to use the original html, but its tidiness assumptions get violated. See this jtidy bug report: http://sourceforge.net/p/jtidy/discussion/41437/thread/408cffe8/
Right now yahp is stripping out
<script>
elements, but it does it after passing the html through jtidy. I think a reasonable workaround would be to strip out the scripts before calling jtidy, so jtidy won't fail. I am working on coding this fix.Is this project still active enough for a pull request that fixes this to get merged?