eof parsing anomaly - Githubissues

GoogleCodeExporter commented 9 years ago

this anomaly is harmless.  I don't think it needs to be fixed.  I'm
reporting it because I wasn't sure it was safe, so I tracked down the cause.

1. if you cajole "<div\n", you get
  Fatal Exception: com.google.caja.lexer.ParseException:
    Unexpected end of input
which is fine.

2. if you cajole "<div" without a trailing newline, you get
   IMPORTS___.htmlEmitter___.pc('\074div');
which is strange, but harmless.

what's happening:

in case 2, when HtmlInputSplitter.parseToken sees eof, it converts the
pending token into a TEXT token.

in case 1, HtmlInputSplitter.parseToken sees the newline, returns a
TAGSTART token, and then DomParser.parseDom keeps requesting tokens until
it sees a TAGEND.  the tokens come from a TokenQueue<>, which throws the
exception at eof.

Original issue reported on code.google.com by felix8a on 23 Jul 2008 at 6:08

GoogleCodeExporter commented 9 years ago

If DomProcessingEvents is capable of producing output that has partial tags, 
then
that's a vulnerability.

Original comment by mikesamuel@gmail.com on 7 Aug 2008 at 3:52

Changed state: Accepted
Added labels: Priority-Medium
Removed labels: Priority-Low

GoogleCodeExporter commented 9 years ago

Original comment by davidsar...@googlemail.com on 8 Aug 2008 at 2:29

Added labels: Security

GoogleCodeExporter commented 9 years ago

Made <div and <div\n consistent.

This was never a problem with the generated HtmlEmitter code since that is 
generated
from an AST, not from a token stream.

Original comment by mikesamuel@gmail.com on 19 Aug 2008 at 3:22

Changed state: Pending

GoogleCodeExporter commented 9 years ago

Original comment by mikesamuel@gmail.com on 20 Aug 2008 at 12:41

Changed state: Fixed

amohanta / google-caja

eof parsing anomaly #658