Looking at antisamy.xml, SVN revision 137:
<regexp name="htmlClass" value="[a-zA-Z0-9\s,-_]+"/>
The intent of this regex appears to be to allow a comma, hyphen, and
underscore; however, the regex treats the hyphen as the range operator,
allowing all characters between comma and underscore. I'm not sure that's
the intent.
Other examples where the hyphen should probably be escaped: (In some of
these cases, it doesn't really make a difference, but having it escaped
would probably make it more clear that the hyphen is being used as a
literal, not as the range operator):
<regexp name="htmlId" value="[a-zA-Z0-9-_]+"/>
<regexp name="htmlTitle" value="[\p{L}\p{N}\s-_',:\[\]!\./\\\(\)]*"/>
<regexp name="cssCommentText"
value="[\p{L}\p{N}-_,\/\\\.\s\(\)!\?\=\$#%\^&:"']+"/>
<regexp value="[a-zA-Z0-9-_\$]+"/> (inside <attribute name="name">)
<regexp name="cssIdentifier" value="[a-zA-Z][a-zA-Z0-9-]+"/>
Original issue reported on code.google.com by danr...@gmail.com on 23 Dec 2009 at 8:08
Original issue reported on code.google.com by
danr...@gmail.com
on 23 Dec 2009 at 8:08