dqw / owaspantisamy

Automatically exported from code.google.com/p/owaspantisamy
0 stars 0 forks source link

Scanner appends break lines #10

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Steps to reproduce:

Call the "scan" method of antisamy using the following text:
"javascript <b>java</b> python" 
and any policy that accepts the "b" tag.

Example:
CleanResults cs = new AntiSamy().scan(text_, policy);
text_ = cs.getCleanHTML();

The "getCleanHTML" method then returns the following string:
"javascript <b>java</b>\n python\n".

It inserts two breaklines, one after the end of the "b" tag and another one
at the end of the string.

I expect the string to be identical to the input since the "b" tag is an
allowed one, no breaklines should have been introduced.

This is caused in the "scan" method of AntiSamyDOMScanner. When creating
the OutputFormat, it sets the indenting to true and a value of 2. Why is
the indenting needed? Turning the indenting to false and removing the
"setIndent(2)" gives the right result. AntiSamy shouldn't attempt to indent
or pretty print the input.

Original issue reported on code.google.com by carlos.a...@gmail.com on 30 May 2008 at 1:45

GoogleCodeExporter commented 8 years ago
It does this simply for readability. Otherwise, if you had 11 tags, they would 
all
show up on the same line. I'll send an email to the mailing list to see what 
most
users would prefer.

I can definitely add it as a directive (no formatting).

Original comment by arshan.d...@gmail.com on 2 Jun 2008 at 1:37

GoogleCodeExporter commented 8 years ago
The directive sounds good to me, thanks for the prompt response!

Original comment by carlos.a...@gmail.com on 2 Jun 2008 at 2:21

GoogleCodeExporter commented 8 years ago
I second this bug, and I second the directive as a solution.

Original comment by thedownw...@gmail.com on 6 Jun 2008 at 10:39

GoogleCodeExporter commented 8 years ago
I third this bug, and I third the directive as a solution

Original comment by atsouk...@gmail.com on 9 Jun 2008 at 2:31

GoogleCodeExporter commented 8 years ago
I added a change in 1.2 that prevents any OutputFormat changes from taking 
place.
However, the CSS serialization is quite separate. Does everyone find this 
agreeable?

Original comment by arshan.d...@gmail.com on 17 Jun 2008 at 6:00

GoogleCodeExporter commented 8 years ago
Sounds good to me, I'll try it out!

Original comment by carlos.a...@gmail.com on 19 Jun 2008 at 2:56

GoogleCodeExporter commented 8 years ago
Shouldn't the "formatOutput" directive only set the "lineWidth" and "indenting" 
in
lines 170-172 of AntiSamyDOMScanner?. Lines 174 through 177 should be outside 
the if
statement that checks if "formatOutput" is set to true (line 169).

I just tried the 1.2 release and if "formatOutput" is null or false, it will 
skip the
"omitXmlDeclaration" and "omitDoctypeDeclaration". I saw that after getting the
"finalCleanHTML" in line 199 you strip out the xml declaration, however there's
nothing done for the "omitDoctypeDeclaration". So when I run the example 
mentioned
above, I get this back:

javascript <!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/WD-html-in-xml/DTD/xhtml1-strict.dtd">
java python

This is because I'm not setting the formatOutput. But then I'd still expect 
AntiSamy
to just return:

javascript java python

Original comment by carlos.a...@gmail.com on 19 Jun 2008 at 5:43

GoogleCodeExporter commented 8 years ago
Yes, my logic was screwed up. I have fixed it for the next release.

Original comment by arshan.d...@gmail.com on 10 Jul 2008 at 12:46