amohanta / google-caja

Automatically exported from code.google.com/p/google-caja
0 stars 0 forks source link

eof parsing anomaly #658

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
this anomaly is harmless.  I don't think it needs to be fixed.  I'm
reporting it because I wasn't sure it was safe, so I tracked down the cause.

1. if you cajole "<div\n", you get
  Fatal Exception: com.google.caja.lexer.ParseException:
    Unexpected end of input
which is fine.

2. if you cajole "<div" without a trailing newline, you get
   IMPORTS___.htmlEmitter___.pc('\074div');
which is strange, but harmless.

what's happening:

in case 2, when HtmlInputSplitter.parseToken sees eof, it converts the
pending token into a TEXT token.

in case 1, HtmlInputSplitter.parseToken sees the newline, returns a
TAGSTART token, and then DomParser.parseDom keeps requesting tokens until
it sees a TAGEND.  the tokens come from a TokenQueue<>, which throws the
exception at eof.

Original issue reported on code.google.com by felix8a on 23 Jul 2008 at 6:08

GoogleCodeExporter commented 9 years ago
If DomProcessingEvents is capable of producing output that has partial tags, 
then
that's a vulnerability.

Original comment by mikesamuel@gmail.com on 7 Aug 2008 at 3:52

GoogleCodeExporter commented 9 years ago

Original comment by davidsar...@googlemail.com on 8 Aug 2008 at 2:29

GoogleCodeExporter commented 9 years ago
Made <div and <div\n consistent.

This was never a problem with the generated HtmlEmitter code since that is 
generated
from an AST, not from a token stream.

Original comment by mikesamuel@gmail.com on 19 Aug 2008 at 3:22

GoogleCodeExporter commented 9 years ago

Original comment by mikesamuel@gmail.com on 20 Aug 2008 at 12:41