Closed GoogleCodeExporter closed 8 years ago
"It seems like if it is valid Html, then the output of this method should equal
the input exactly"
This isn't actually true. That's why it's recommended to use the cleaned
results instead of using the number of errors as an indicator is dirty is safe.
Not only will the output be changed when malicious content is entered, but
various encoding and formatting is also performed.
If you stick that string in an HTML context, it displays properly.
Original comment by tad...@gmail.com
on 2 Apr 2012 at 7:21
"If" is the operative word. I agree with you that you have stated how it
works, but I don't understand why "It seems like if it is valid HTML, then the
output of this method should equal the input exactly" should not be true, or be
very close to true. I guess I don't really care about formatting, but, in the
least I don't think non-malicious content should be modified. I would expect
unaccepted HTML to be stripped out, but would not expect the content to be
encoded. In my opinion encoding should occur on output, not on input, because
you have no idea the context of the output. We store it in a database, and
after that, the final destination in not necessarily known. It could be output
into HTML, into JavaScript, into XML, ... If it has been encoded into HTML
format, that is rather problematic for other output contexts.
Original comment by mattmwhe...@gmail.com
on 2 Apr 2012 at 8:07
Sorry it took me so long to respond.
As for modifying the what you sent to AS, sometimes it is necessary. It will
add valid close tags if you specify XHTML output, it will add whitespace
formatting if you choose to have it format, and it will HTML encode special
characters. It probably does some other manipulation to the original input,
but those are some of the more common ones.
Now you might not agree with the last one I listed. You state that "because
you have no idea the context of the output". Indeed you do, because AS will
only validate input that will eventually live in an HTML context. If you are
sticking this in a JS context, then your app is currently vulnerable. For
example, if you send in "alert('blah')" into AS, this will validate as OK,
since it appears to just be a string. In other words, it *is* safe for that
string to live in an HTML context, but more malicious JS strings would *not* be
safe in a JS context.
As far as input that should live in another context other than HTML, you should
*always* encode that input according to the context in which it will live.
There's no reason to clean it if you encode it. If you want to validate it, go
right ahead, but don't use AS (unless it will live in HTML). Again, take a
whitelist approach and only allows a specific subset of characters.
I hope this answers your question and helps with your usage of AS in the future.
If I have said anything incorrectly, would one of the AS devs please correct
me/clarify?
Original comment by tad...@gmail.com
on 25 Apr 2012 at 1:50
tad9ab is correct. We only work within HTML contexts - the problem becomes
unmanageable without that restriction in place.
Original comment by arshan.d...@gmail.com
on 24 Jun 2012 at 5:19
Original issue reported on code.google.com by
mattmwhe...@gmail.com
on 13 Mar 2012 at 2:48