fusioncop / owasp-esapi-java

Automatically exported from code.google.com/p/owasp-esapi-java
Other
0 stars 0 forks source link

non-BMP characters incorrectly encoded #294

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. run test case

What is the expected output? What do you see instead?
The test case encodes a non-BMP character, which internally is represented in 
java as two chracters, yet needs to be serialized as a single HTML character.

What version of the product are you using? On what operating system?
2.0.1, win

Please provide any additional information below.
Test:

    public static void main(String[] args) {
        String test = new String (new int[]{0x2f804}, 0, 1);
        System.out.println(test + " " + test.length());
        System.out.println(ESAPI.encoder().encodeForHTML(test));
    }

Note: this problem has been mentioned over two years ago in 
http://ainthek.blogspot.de/2010/09/orgowaspesapicodecshtmlentitycodecjava.html 
but apparently hasn't been fixed.

Original issue reported on code.google.com by julian.r...@googlemail.com on 1 Mar 2013 at 4:36

GoogleCodeExporter commented 9 years ago
test cases:

        public void testHtmlEncodeStrSurrogatePair()
        {
                String inStr = new String (new int[]{0x2f804}, 0, 1);
                String expected = "你";
                String result;

                result = htmlCodec.encode(EMPTY_CHAR_ARRAY, inStr);
                assertEquals(expected, result);
        }

and

        public void testHtmlDecodeHexEntititesSurrogatePair()
        {
                String expected = new String (new int[]{0x2f804}, 0, 1);
                assertEquals( expected, htmlCodec.decode("你") );
                assertEquals( expected, htmlCodec.decode("你") );
        }

Original comment by julian.r...@googlemail.com on 4 Mar 2013 at 1:38