Closed GoogleCodeExporter closed 9 years ago
Can you look at the hex output of the AntiSamy getCleanHTML() call and confirm
this? I just made a test case and it doesn't display this behavior. Here's the
test:
sb = new StringBuilder();
String header = "<h1>Header</h1>";
String para = "<p>Paragraph</p>";
sb.append(header);
sb.append(nl);
sb.append(para);
String crDom = as.scan(sb.toString(), policy, AntiSamy.DOM).getCleanHTML();
String crSax = as.scan(sb.toString(), policy, AntiSamy.SAX).getCleanHTML();
/* Make sure only 1 newline appears */
assertTrue(crDom.lastIndexOf(nl) == crDom.indexOf(nl));
assertTrue(crSax.lastIndexOf(nl) == crSax.indexOf(nl));
int expectedLoc = header.length() + 1;
int actualLoc = crSax.indexOf(nl);
assertTrue(expectedLoc == actualLoc);
actualLoc = crDom.indexOf(nl);
// account for line separator length difference
assertTrue(expectedLoc == actualLoc || expectedLoc == actualLoc+1);
Original comment by arshan.d...@gmail.com
on 7 Jun 2011 at 9:08
Arshan,
I have a couple of things to mention regarding this issue. First off, I am
assuming you are trying with 1.4.4, and not with the current trunk, right?
Assuming we are still testing 1.4.4, I noticed a couple of things. I think the
people who have experienced this issue have been using the default scan method
which uses DOM. Running your test, I notice that the SAX implementation (at
least for me) does not add the extra newline characters.
One slight difference I am experiencing from the issue reporter is that I only
see the second additional newline in the case that there is text after the
second set of html tags. For example, "<h1>Header</h1><p>Paragraph</p>test"
causes two newlines, without the "test" on the end, just one. And that brings
up another observation, I see the newlines added regardless of newlines
existing prior to the cleaning.
One more thing to mention, your last two tests (DOM and SAX) don't really
provide much. Even if a second newline was being added, you would *expect* to
see the first newline in the same place that you added it. Unless those tests
were created to show that the newlines weren't being removed by the cleaning.
Original comment by tad...@gmail.com
on 10 Jun 2011 at 2:27
I have finally gotten chance to get back to this, and I am seeing the same
behavior as the previous commenter. I get extra newlines when I do not specify
the implementation. When I use the SAX implementation, my problems go away.
Unlike the previous commenter, I see this behavior whether or not there is
extra text after the HTML. Curiously, antisamy does not add an extra newline to
the very last line. So I was incorrect in my original post. It goes from being
(I'll add '\n' to make it more clear)
"<h1>Header</h1>\n
<p>Paragraph</p>"
to
"<h1>Header</h1>\n
\n
<p>Paragraph</p>"
I also noticed that if there are two tags on one line, antisamy puts a newline
between them. For example, if I have "<h1>Header</h1><p>Paragraph</p>" then
antisamy gives me
"<h1>Welcome</h1>\n
<p>Paragraph</p>"
Original comment by samjones...@gmail.com
on 15 Jun 2011 at 6:00
An update to this issue...
It appears as though updating my policy with the line "<directive
name="formatOutput" value ="false"/>" prevents the newlines from being added.
By default it is turned on, but turning it off seems to do what I am wanting.
Arshan, were you using the defaults when trying to reproduce?
Sam, perhaps you can try this as well and see if it will be an acceptable
workaround. Let me know if you see any issues with this (I haven't tested it a
whole lot).
-Troy
Original comment by tad...@gmail.com
on 13 Jul 2011 at 9:59
Indeed, selecting to have AntiSamy format your output will result in whitespace
modification. I did notice discrepancies with newline behavior, so I have made
some changes to HEAD.
Original comment by arshan.d...@gmail.com
on 15 Sep 2011 at 8:13
Original issue reported on code.google.com by
samjones...@gmail.com
on 27 Apr 2011 at 5:15