Closed GoogleCodeExporter closed 9 years ago
Do I have to change everything to utf-8, including character encoding in gss
source? And follow this guide for jboss?
Set URIEncoding="UTF-8" on your <Connector> in server.xml. References: HTTP
Connector, AJP Connector.
Use a character encoding filter with the default encoding set to UTF-8
Change all your JSPs to include charset name in their contentType.
For example, use <%@page contentType="text/html; charset=UTF-8" %> for the
usual JSP pages and <jsp:directive.page contentType="text/html; charset=UTF-8"
/> for the pages in XML syntax (aka JSP Documents).
Change all your servlets to set the content type for responses and to include
charset name in the content type to be UTF-8.
Use response.setContentType("text/html; charset=UTF-8") or
response.setCharacterEncoding("UTF-8").
Change any content-generation libraries you use (Velocity, Freemarker, etc.) to
use UTF-8 and to specify UTF-8 in the content type of the responses that they
generate.
Disable any valves or filters that may read request parameters before your
character encoding filter or jsp page has a chance to set the encoding to
UTF-8. For more information see
http://www.mail-archive.com/users@tomcat.apache.org/msg21117.html.
Regards,
Nikola
Original comment by ngara...@gmail.com
on 17 Nov 2010 at 3:01
Has your <Connector> in server.xml a URIEncoding="UTF-8" argument? If not, you
should add it and restart jboss. I cannot reproduce the problem in our
installations (yes, croatian letters work perfect here :-)), so make the change
and let us know.
Original comment by chstath
on 17 Nov 2010 at 3:26
Yes, both AJP and HTTP connectors in jboss have already been set to UTF-8 (out
of the box I think, since I dont remember changing it). So, you can create
folder with croatian letters? I can create, but am not able to fetch folder
later on.
I get 502 bad gateway error when trying to list it.
I dont get how files with croatian letters work without problems on the other
hand, and folders wont.
Regards,
Nikola
Original comment by ngara...@gmail.com
on 17 Nov 2010 at 9:20
It is probable that the 502 response comes from apache. Try to use one of the
jboss servers directly to see what error is returned.
Original comment by chstath
on 19 Nov 2010 at 2:58
After closer inspection, here is what I got from haproxy discussion group:
>> echo "show errors" | socat stdio unix-connect:/var/run/haproxy.sock
> >
> > # echo "show errors" | socat stdio unix-connect:/var/run/haproxy.sock
> >
> > [19/Nov/2010:15:01:56.646] backend www (#1) : invalid response
> > src aaa.bbb.ccc.ddd, session #645, frontend www (#1), server
> > backend-srv1 (#1)
> > response length 857 bytes, error at position 268:
> >
> > 00000 HTTP/1.1 200 OK\r\n
> > 00017 Date: Fri, 19 Nov 2010 14:01:56 GMT\r\n
> > 00054 Server: Apache/2.2.3 (CentOS)\r\n
> > 00085 X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1\r\n
> > 00136 Expires: -1\r\n
> > 00149 X-GSS-Metadata:
> > {"creationDate":1290002859579,"createdBy":"ngarafol@sr
> > 00219+
> > ce.hr","modifiedBy":"username@domain","name":"a\r\x07\x11~","owner":"
> > 00282+
> > username@domain","modificationDate":1290002859579,"deleted":false}\r
> > 00350+ \n
> > 00351 Content-Length: 418\r\n
> > 00372 Connection: close\r\n
> > 00391 Content-Type: application/json;charset=UTF-8\r\n
> > 00437 \r\n
> > 00439
> > {"files":[],"creationDate":1290002859579,"createdBy":"username@domain
> > 00509+
> > ","modifiedBy":"username@domain","readForAll":false,"name":"\xC5\xA1
> > 00572+
> > \xC4\x8D\xC4\x87\xC4\x91\xC5\xBE","permissions":[{"modifyACL":true,"wr
> > 00618+
> > ite":true,"read":true,"user":"username@domain"}],"owner":"username@domain
> > 00688+ ce.hr","parent":{"name":"User User","uri":"http://server/p
> > 00758+
> > ithos/rest/username@domain/files/"},"folders":[],"modificationDate":1
> > 00828+ 290002859579,"deleted":false}
Excellent, we have it now.
> > 00149 X-GSS-Metadata:
{"creationDate":1290002859579,"createdBy":"ngarafol@sr
> > 00219+
ce.hr","modifiedBy":"username@domain","name":"a\r\x07\x11~","owner":"
> > 00282+
username@domain","modificationDate":1290002859579,"deleted":false}\r
> > 00350+ \n
You see above, position 268 ? It's the \x07 just after the \r on the second
line. The issue is not related to UTF-8 at all, those are just forbidden
characters possibly resulting from corrupted memory. The "\r" prefixes an
end of header and may only be followed by a "\n".
From RFC2616:
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content = <the OCTETs making up the field-value
and consisting of either *TEXT or combinations
of token, separators, and quoted-string>
token = 1*<any CHAR except CTLs or separators>
quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext = <any TEXT except <">>
quoted-pair = "\" CHAR
TEXT = <any OCTET except CTLs,
but including LWS>
separators = "(" | ")" | "<" | ">" | "@"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
CHAR = <any US-ASCII character (octets 0 - 127)>
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
So as you can see, CTL characters cannot appear anywhere unescaped
(an HTTPBIS spec refines that further by clearly insisting on the
fact that those chars may not even be escaped). So clearly those
0x0D 0x07 0x11 characters at position 268 are forbidden here and
break the parsing of the line.
What I suspect is that the characters were UTF-8 encoded in the
database, but the application server stripped the 8th bit before
putting them on the wire, which resulted in what you have. That's
just a pure guess, of course. Another possibility is that those bytes
represent an integer value that was accidentely outputted with a "%c"
formatting instead of a "%d".
We can't even let that pass with "option accept-invalid-http-response"
because the issue will be even worse for characters that are returned
as 0x0D 0x0A, that will end the line and start a new header with the
remaining data.
The only solution right here is to try to see where it breaks in the
application (maybe it's a memory corruption issue after all) and to
fix it ASAP.
Original comment by ngara...@gmail.com
on 20 Nov 2010 at 12:36
I 'm not sure I understand. Is it a problem with haproxy or with gss? Or
something between the two?
Original comment by chstath
on 23 Nov 2010 at 9:49
Problem is not with haproxy but with X-GSS-Metadata HTTP header not conforming
the RFC when folder or file have some UTF-8 characters encoded, in my case
Croatian letters I've mentioned earlier. X-GSS-Metadata (or any other variable
in the header) is not allowed to have UTF-8 characters but only ISO-8859-1.
It's defined in the RFC.
Take a look at the field name in the X-GSS-Metadata:
"name":"a\r\x07\x11~"
and later in the Content-Type:
"name":"\xC5\xA1\xC4\x8D\xC4\x87\xC4\x91\xC5\xBE"
It's obvious that whoever puts X-GSS-Metadata in the headers does it wrong,
because Content-Type is encoded how it should be. All you have to do is
eliminate UTF-8 characters or encode them in the way they are encoded in the
Content-Type.
Do you understand?
Original comment by ngara...@gmail.com
on 23 Nov 2010 at 10:48
ΟΚ. I found the problem in the code. I have to check first if the fix breaks
anything in the other clients (e.g. the desktop client) and if not then I 'll
do the patch. I believe that tomorrow we 'll have something to test.
Original comment by chstath
on 23 Nov 2010 at 12:43
Original comment by chstath
on 29 Nov 2010 at 10:00
I made a fix about this. It is in both the default branch and solr1.4 branch.
Check if it is ok and let me know so that I can close the issue
Original comment by chstath
on 29 Nov 2010 at 3:41
Tested updated source, seems to work. No errors visible. Everything looks ok.
Thanks for fixing it.
Regards,
Nikola
Original comment by ngara...@gmail.com
on 30 Nov 2010 at 11:10
Original comment by chstath
on 30 Nov 2010 at 11:19
Original issue reported on code.google.com by
ngara...@gmail.com
on 17 Nov 2010 at 2:18