Laplandia / owaspantisamy

Automatically exported from code.google.com/p/owaspantisamy
0 stars 0 forks source link

td not in list of tags that are allowed to be empty #80

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Filter a table with one empty cell (<table><tr><td></td><td>hi 
mom</td></tr></table>) and that td is removed 

What is the expected output? What do you see instead?

I would expect an empty td to be left so that all following td's in the same 
row fall in the correct column.  Instead it is removed and all subsequent cells 
in that row are shifted one column to the left.

What version of the product are you using? On what operating system?

1.4, though the code to 1.4.1 still has the issue in Constants.java

Please provide any additional information below.

Constants.java has a list of tags (String []) that are allowed to be empty.  
That list does not contain td, which corrupts tables that have empty cells by 
shifting remaining cells to the left.

Also, empty spans and divs should be allowable as they are commonly rendered 
empty as a target for ajax updates.  (The response of an ajax call is written 
to an empty div or span that is acting as a placeholder on the page.)

The underlying issue is that the String [] in Constants is not configurable.  
Ultimately, I would like to be able to configure whatever empty tags I want in 
my policy file.

I'm not able to compile on the machine I'm working on, so I haven't tested 
this, but the attached version of Constants.java should resolve the issues I've 
described.

Thanks, 

  Jacob

Original issue reported on code.google.com by jacob.co...@gmail.com on 2 Jul 2010 at 7:05

Attachments:

GoogleCodeExporter commented 9 years ago
If I add code to put the "allowedEmptyTags" in the policy file instead of in 
Constants.java so that it is configurable, would you make it part of the 
project? 

I have to compile my own version anyway.  If you are interested in the change, 
I'll do it right and make it configurable in the policy file.  If not, I'll 
just hard-code the changes I need in my own Constants.java.  I don't want to go 
through the effort if it isn't something you are interested in.  Please let me 
know.

Thanks,

  Jacob Coulter

Jacob.Coulter@gmail.com

Original comment by jacob.co...@gmail.com on 13 Jul 2010 at 5:46

GoogleCodeExporter commented 9 years ago
I went ahead and built my own version with the Constants.java above and it 
works as expected.  That is the simplest fix for anyone else with the same 
problem.  

I am still willing to code it right and have the allowable empty tags be in the 
policy file if someone will give me an indication that such a change will be 
accepted.  I don't want to do it if it is just a waste of my time.

thanks,

  Jacob.Coulter@gmail.com

Original comment by jacob.co...@gmail.com on 19 Jul 2010 at 4:13

GoogleCodeExporter commented 9 years ago

Original comment by arshan.d...@gmail.com on 3 Aug 2010 at 12:09

GoogleCodeExporter commented 9 years ago
Here is a patch that takes the configuration out of the Constants.java and 
places it into the policy file.  As I understand it, the philosophy behind the 
Policy file is that anything not explicitly stated is disallowed.  In keeping 
with that philosophy, all policy files will need to include the new 
<allowed-empty-tags> section like this:

    <allowed-empty-tags>
        <literal-list>
            <literal value="br"/>
            <literal value="hr"/>
            <literal value="a"/>
            <literal value="img"/>
            <literal value="link"/>
            <literal value="iframe"/>
            <literal value="script"/>
            <literal value="object"/>
            <literal value="applet"/>
            <literal value="frame"/>
            <literal value="base"/>
            <literal value="param"/>
            <literal value="meta"/>
            <literal value="input"/>
            <literal value="textarea"/>
            <literal value="embed"/>
            <literal value="basefont"/>
            <literal value="col"/>
        </literal-list>
    </allowed-empty-tags>

I'm available to discuss any issues or concerns.  

Thanks,

Jacob Coulter & Mark Oberhaus

Jacob.Coulter@gmail.com

Original comment by jacob.co...@gmail.com on 11 Jan 2011 at 8:02

Attachments:

GoogleCodeExporter commented 9 years ago
I think we'll integrate this and leave the Constants in place in case the 
user's policy XML has not been updated to contain the new section.

Original comment by arshan.d...@gmail.com on 3 Feb 2011 at 8:07

GoogleCodeExporter commented 9 years ago
I've struggled with this bug for the better part of the last few hours. 

The inspirational problem case is this: a self-enclosed tag like <i/> makes the 
rest of the page italicized.

The way I see it, this should be at least made into <i></i>, or better yet 
removed. As of now, in both versions, tags in the whitelist like (<br> and 
<hr/>) should stay as they are found. Tags that are not in the whitelist and 
found to be truly empty should be removed. 

There is a slight bug in the NekoHTML SAX parser that means sometimes we can't 
tell the difference between <empty/> and <empty></empty>. I tried several ways 
of getting around it and have given up. I don't think this will be a big 
problem in practice, but I've been wrong before and am always happy to accept 
patches.

The behavioral inexactitude is difficult for me to accept, but I don't have 
time to dig into it any more. Check out AntiSamyTest.java (search for "issue 
#80") if you want to see how I decide if I'm happy (enough) with the behavior 
or sync up to head and play with it as it stands.

By the way, this all applies when the useXhtml directive is set to FALSE. If 
you want XHTML, you'll get the problematic self-enclosed tags by design.

Thanks for your patch Jacob/Mark.

Original comment by arshan.d...@gmail.com on 4 Feb 2011 at 6:10

GoogleCodeExporter commented 9 years ago
By the way, the fix has been checked into HEAD and will be included in our next 
minor release.

Original comment by arshan.d...@gmail.com on 4 Feb 2011 at 7:36

GoogleCodeExporter commented 9 years ago
I'm not sure what happens - environmental weirdness, maybe - but my revision of 
your patch got lost and never made it into HEAD. I'll have to re-create.

Original comment by arshan.d...@gmail.com on 7 Jun 2011 at 6:19

GoogleCodeExporter commented 9 years ago
I think the code might have been lost for your AntiSamyTest.java example of 
what you want for this issue.  

Also, it's interesting that you are running into a NekoHTML issue.  The version 
that in this code is blowing up Intellij when you try to parse any schema with 
this on your classpath.  It is an out of date xml parser. 

Original comment by jacob.co...@gmail.com on 12 Sep 2011 at 4:41

GoogleCodeExporter commented 9 years ago
Arshan,

Do you plan to release a minor version with this change?

Thanks,
Giriraj.

Original comment by girira...@gmail.com on 4 Nov 2011 at 11:31

GoogleCodeExporter commented 9 years ago
For the impatient folks (like me ;)) here's a workaround that does not require 
you to compile your own version of the Constants.java. So using the following 
code inside a static initializer of one of your classes that use antisamy the 
"td" gets added to the list of allowed empty tags:

        if (org.owasp.validator.html.scan.Constants.allowedEmptyTags != null) {
            List<String> allowedEmptyTagsList = new ArrayList<String>(Arrays.asList(org.owasp.validator.html.scan.Constants.allowedEmptyTags));
            if (!allowedEmptyTagsList.contains("td")) {
                allowedEmptyTagsList.add("td");
                org.owasp.validator.html.scan.Constants.allowedEmptyTags = allowedEmptyTagsList.toArray(new String[0]);
            }
        }

Best Regards,
Christian 

Original comment by m...@Christian-Schneider.net on 10 Nov 2011 at 8:51

GoogleCodeExporter commented 9 years ago
I'm glad it works for you, but that's not a fix I could live with.  

I'd like to get in line requesting a minor release.  I'd really like to get off 
of my forked version.  

Original comment by jacob.co...@gmail.com on 10 Nov 2011 at 7:22