Closed ren-zhijun-oracle closed 15 years ago
@javaserverfaces Commented Reported by imario
@javaserverfaces Commented @rlubke said: This looks similar to issue 727 [1]. However, 727 only covers the case where a decimal ref would be written and not the html entities.
If you'd like to submit a patch that builds off 727, that would be great.
[1] https://javaserverfaces.dev.java.net/issues/show_bug.cgi?id=727
@javaserverfaces Commented @rlubke said: I'm no expert in this area, but I wonder if it's possible to get rid of the configuration options altogether and base the behavior on the current response character set?
@javaserverfaces Commented imario said:
I'm no expert in this area, but I wonder if it's possible to get rid of the configuration options altogether and base the behavior on the current response character set?
I am not sure about that. The internal representation of a Java String will always be UTF-16 (hopefully ). Which means, that a String can hold (nearly?) any character available in IT world.
AFAIK, in UTF-16 any character <=0xFF will map to the ISO-8859-1 charset. Now, when it comes to render the string using a page encoding different to UTF or ISO-8859-1 it might not be possible to map the character accordingly, which means we have to html entity encode it again.
In other words: 1) we can avoid any html entity encoding if the output is UTF 2) we can avoid any ISO-8859-1 mapped html entity encoding if the output is UTF-8 or ISO-8859-1 (have to look at ISO-8859-15; the euro-character-charset in more detail) 3) else we have to stick with current html entity encoding
I think that makes sense.
I'll try to create a patch following the above rules.
@javaserverfaces Commented imario said: Shall I prepare a patch which keeps BUG727 or can I prepare a new one which will replace BUG727 with the logic outlined above. Which means the "com.sun.faces.disableUnicodeEscaping" configuration can be removed again.
@javaserverfaces Commented @rlubke said: I believe we added the config option as there were backwards compatibility concerns, however, thinking more about this if the rules you've outlined are sufficient then it would be nice to get by without yet another config option.
Adding Mike to the CC list as he may have input.
@javaserverfaces Commented youngm said: One thing to be aware of while writing this patch is it can be very difficult to attempt to auto detect the response charset. There are any number of areas between the browser, and the responsewriter that can mess up the encoding of the response. For Example, some filter may be creating a response buffer set to encode to ISO-8859-1. If the faces response encoding is set to UTF-8 and the ResponseWriter assumes it doesn't need to encode anything we may run into trouble. The original switch was put in there to reward people who put forth the effort to ensure that everything between the browser and the responsewriter are all using the same encoding.
So if you want to include ISO8859_1_Entities entities in there that's great but keep in mind it should be overridable and I think it should default to always HTML encode as the "always work" scenario.
One possible patch might be to check the disableUnicodeEscaping flag. If it is true then you can check the response encoding to determing if you should html encode the ISO8859_1_Entities.
Just a thought.
Mike
Mike
@javaserverfaces Commented imario said: Ok, I've a patched version running here now which keeps the configuration flag in mind and does its processing only if disableUnicodeEscaping=true.
What we have now is:
0) disableUnicodeEscaping=false, so encode everything a) Allow "pass through" mode if output is set to UTF-8 b) Allow "pass through" of ISO-8859-1 chars if output is set to UTF-8 c) Html-Encode if the above said does not match
But this now means, if one has output set to ISO-8859-15 s/he gets the same output as with disableUnicodeEscaping=false, even though, if Java will be able to convert the UTF-16 char to a corresponding ISO-8859-15 byte.
The question is, do we want to support scenario d?
d) Allow "pass through" even if the output encoding might not be able to represent the character.
Which then results into having "?" rendered if Java can not convert the character.
While I think it would be nice to support d, I ask myself if it shouldn't be common to use an UTF charset for any web-page rendering. You can configure that in Java without having to touch the application. So very easy. I think the most common use-cases are 0 and a.
After a code-cleanup I should be able to attach a patch for review in the next 48 hours.
@javaserverfaces Commented imario said: Sorry, b should read:
b) Allow "pass through" of ISO-8859-1 chars if output is set to ISO-8859-1, html encode any unicode character.
@javaserverfaces Commented youngm said: Sounds good.
I think the only way that we could support an option "d" would be to add another configuration flag to tell JSF to never encode ISO-8859-1 entities which I believe is unnecessary.
I think that the case of someone using something other than ISO-8859-1 or UTF-8 is going to be rare. If that case comes up they will just have to live with html encoded ISO-8859-1 characters.
Mike
@javaserverfaces Commented imario said: Created an attachment (id=658) patch to avoid unicode/iso escaping if possible (and allowed through configuration)
@javaserverfaces Commented imario said: JCA/SCA sent to JDIC_JCA_submit@sun.com
@javaserverfaces Commented @rlubke said: r=rlubke. Mike - any comments?
@javaserverfaces Commented youngm said: I like the ISO escaping you added.
However, it looks like that even if the user specifies to "disableUnicodeEscaping" we are still doing a check to see if we are really using unicode. I'm not much of a charset expert but would there ever be a situation where an application was not using a unicode charset but wanted to disable escaping anyway?
The configuration option in 727 kind of gave the power to the user to determine that they are using a charset that doesn't need to be escaped. This patch limits that power by only letting them disable escaping if they are using a unicode charset that is in our list.
But then again I'm not a charset expert so maybe there is never a case to not escape one of the charsets we specify?
If that is not the case then I would propose this code snippet instead:
if (disableUnicodeEscaping) { escapeUnicode = false; if (HtmlUtils.isISO8859_1encoding(charsetName) || HtmlUtils.isUTFencoding(charsetName))
{ escapeIso = false; }
}
Just a thought. Otherwise: r=youngm
Mike
@javaserverfaces Commented imario said:
If the faces response encoding is set to UTF-8 and the ResponseWriter assumes it doesn't need to encode anything we may run into trouble
From this comment I was under the impression that we should try hard to provide a correct html output in any case. Means, even if the developer misconfigured the system we should output html code which always renders the correct character on the client side.
Probably I got that wrong.
The current solution allows to have a correct output with german umlauts etc even if you use US-ASCII. Which is nice I think.
But for sure, I've created the list of UTF charsets out of the out of the output of Charset.listAvailable (or so) on a windows xp machine, I also do not know if it is complete. But wouldn't it be very unlikely to use anything else then UTF-8 or UTF-16?
If we change the escape* detection as you outlined, then why bind the ISO escaping to the charset and do not simply not encode it too? We can argue that the developer wanted it that way. For sure, that might lead to having "?" rendered if the response was not configured correctly.
What about keeping my proposed code, but allowing "disableUnicodeEscaping" to be "true|false|auto"?
The documentation then will be:
"disableUnicodeEscaping" allows to control if unicode characters not renderable by the response stream get encoded or not. When set to:
What do you think?
@javaserverfaces Commented imario said: sorry, I mixed "true" and "false" in the above doc.
true: encodeIso=false encodeUnicode=false
false: encodeIso=true encodeUnicode=true
auto: encodeIso=response stream not unicode and not ISO-8859-1 encodeUnicode=response stream not unicde
@javaserverfaces Commented youngm said: Again I don't know much about charsets and when one would be used over another. If you think it would be very rare that something other than a UTF charset would be used then I think we're good with the way the patch is.
Mike
@javaserverfaces Commented imario said: Created an attachment (id=660) updated patch with enhanced configuration option disableUnicodeEscaping=true|false|auto
@javaserverfaces Commented youngm said: r=youngm
@javaserverfaces Commented @rlubke said: Changes applied (svn rev r5033). Thanks for the patch.
@javaserverfaces Commented @rlubke said:
@javaserverfaces Commented @manfredriem said: Closing issue out
@javaserverfaces Commented File: escaping.patch Attached By: imario
@javaserverfaces Commented File: escaping.patch Attached By: imario
@javaserverfaces Commented Was assigned to rlubke
@javaserverfaces Commented This issue was imported from java.net JIRA JAVASERVERFACES-751
@javaserverfaces Commented Marked as fixed on Sunday, July 13th 2008, 9:44:09 am
Hi!
Mojarra encodes any any non-ascii character using the html entity encoding or the xx; html encoding with h:outputText.
I'd like to know what you think about an enhancement that one is able to configure this behavior? It should be sufficent to encode the "dangerous" caracters like <>, etc, other characters could be passed through and written in the charset the developer has choosen for the page, that is, a simple "write thorugh" (if configured) in HtmlUtils.writeText should do the trick, no?
I can try and attach a patch if wanted.
Environment
Operating System: All Platform: All
Affected Versions
[1.2_08]