Closed lawrencedesign closed 8 years ago
First make sure the core php and mbstring use the right encoding :
for mbstring use : echo mb_internal_encoding(); for core php check : default_charset ( php >=5.6 use UTF-8 by default )
Make sure you have UTF-8.
It's strange mb should be lower considering those character use more than one byte and mb_strlen return the number of character as strlen return bytes.
strlen is more suit for the job using with socket_write as you send bytes not character.
Second if first isn't enough: As the mbstring is only use in one place I suggest you that you replace
if (extension_loaded('mbstring')) {
if ($headers['length'] > mb_strlen($this->applyMask($headers,$payload))) {
$user->handlingPartialPacket = true;
$user->partialBuffer = $message;
return false;
}
}
else {
if ($headers['length'] > strlen($this->applyMask($headers,$payload))) {
$user->handlingPartialPacket = true;
$user->partialBuffer = $message;
return false;
}
}
by
if ($headers['length'] > strlen($this->applyMask($headers,$payload))) {
$user->handlingPartialPacket = true;
$user->partialBuffer = $message;
return false;
}
while waiting for an official fix by @ghedipunk
Hello!
I'm using the right encoding, setted with mb_internal_encoding();
I have UTF-8 and my files is UTF-8 too.
I tried with strlen. It works, but its strange.
On 18 September 2015 at 07:03, Xaraknid notifications@github.com wrote:
First make sure the core php and mbstring use the right encoding :
for mbstring use : echo mb_internal_encoding(); for core php check : default_charset ( php >=5.6 use UTF-8 by default )
Make sure you have UTF-8.
It's strange mb should be lower considering those character use more than one byte and mb_strlen return the number of character as strlen return bytes.
strlen is more suit for the job using with socket_write as you send bytes not character.
As the mbstring is only use in one place I suggest you that you replace
if (extension_loaded('mbstring')) { if ($headers['length'] > mb_strlen($this->applyMask($headers,$payload))) { $user->handlingPartialPacket = true; $user->partialBuffer = $message; return false; } } else { if ($headers['length'] > strlen($this->applyMask($headers,$payload))) { $user->handlingPartialPacket = true; $user->partialBuffer = $message; return false; } }
by
if ($headers['length'] > strlen($this->applyMask($headers,$payload))) { $user->handlingPartialPacket = true; $user->partialBuffer = $message; return false; }
while waiting for an official fix by @ghedipunk https://github.com/ghedipunk
— Reply to this email directly or view it on GitHub https://github.com/ghedipunk/PHP-Websockets/issues/53#issuecomment-141345705 .
Oh, I was wondering why this only supported English and no other languages. I started building a chat last week and was going to actually code the server portion myself till I realized it wasn't as simple as I originally thought as I don't know much about handling binary data, so I found this PHP-Websockets Class.
One of the first thing I noticed after setting up the javascript, was that it only was receiving ASCII text. At first I thought it was the javascript portion and that I mighta needed to encode something, but the javascript appeared to be sending the messages to the server just fine.
So I tried the original tutorial about programming websocket servers (there is a download link at the bottom - http://www.sanwebe.com/2013/05/chat-using-websocket-php-socket) and it's basic server worked just fine.
Looking back at the tutorial code, it looks so simple I would almost prefer to use that instead. But I doubt it properly implements the whole protocol like this script does.
Anyways here is a log of what's happening, so that hopefully this script can be fixed.
Chatlog
00:23:36 - People in chat:
00:23:36 - Guest_89395 has connected
00:23:36 - You are now the owner of the room
00:23:55 - Guest_89395: Hello - This is in English
00:24:08 - Disconnected
After my First Message - I also sent but my chat only shows received messages even if you sent it.
こんにちは - これは日本語であります
Server Log
Server started
Listening on: 0.0.0.0:9000
Master socket: Resource id #6
Client connected. Resource id #7
$headers["length"]
int(39)
mb_strlen($this->applyMask($headers,$payload))
int(39)
strlen($this->applyMask($headers,$payload))
int(39)
mb_internal_encoding()
string(5) "UTF-8"
ini_get("default_charset")
string(5) "UTF-8"
$headers
array(8) {
["fin"]=>
string(1) "?"
["rsv1"]=>
string(1) "\u0000"
["rsv2"]=>
string(1) "\u0000"
["rsv3"]=>
string(1) "\u0000"
["opcode"]=>
int(1)
["hasmask"]=>
string(1) "?"
["length"]=>
int(39)
["mask"]=>
string(4) "X+1^"
}
$this->applyMask($headers,payload)
string(39) "{"cmd":"nick","username":"Guest_89395"}"
Received:
object(stdClass)#3 (2) {
["cmd"]=>
string(4) "nick"
["username"]=>
string(11) "Guest_89395"
}
Sending:
string(39) "{"cmd":"nick","username":"Guest_89395"}"
$headers["length"]
int(49)
mb_strlen($this->applyMask($headers,$payload))
int(49)
strlen($this->applyMask($headers,$payload))
int(49)
mb_internal_encoding()
string(5) "UTF-8"
ini_get("default_charset")
string(5) "UTF-8"
$headers
array(8) {
["fin"]=>
string(1) "?"
["rsv1"]=>
string(1) "\u0000"
["rsv2"]=>
string(1) "\u0000"
["rsv3"]=>
string(1) "\u0000"
["opcode"]=>
int(1)
["hasmask"]=>
string(1) "?"
["length"]=>
int(49)
["mask"]=>
string(4) "?q??"
}
$this->applyMask($headers,payload)
string(49) "{"cmd":"say","text":"Hello - This is in English"}"
Received:
object(stdClass)#3 (2) {
["cmd"]=>
string(3) "say"
["text"]=>
string(26) "Hello - This is in English"
}
Sending:
string(70) "{"cmd":"say","text":"Hello - This is in English","from":"Guest_89395"}"
$headers["length"]
int(74)
mb_strlen($this->applyMask($headers,$payload))
int(42)
strlen($this->applyMask($headers,$payload))
int(74)
mb_internal_encoding()
string(5) "UTF-8"
ini_get("default_charset")
string(5) "UTF-8"
$headers
array(8) {
["fin"]=>
string(1) "?"
["rsv1"]=>
string(1) "\u0000"
["rsv2"]=>
string(1) "\u0000"
["rsv3"]=>
string(1) "\u0000"
["opcode"]=>
int(1)
["hasmask"]=>
string(1) "?"
["length"]=>
int(74)
["mask"]=>
string(4) "\u001D?f\u0014"
}
$this->applyMask($headers,payload)
string(74) "{"cmd":"say","text":"こんにちは - これは日本語であります"}"
I replaced unprintable characters with unicode escapes.
Note: I have some other various other debugging info mixed in from my class that extends WebSocketServer. Which is just a var_dump(json_decode($message)) for function process. And the json_encoded($message) being sent back out. Since it looks like the code thinks the japanese text is a partial message, it does not even get received by the server yet. So it never calls process().
Actualy apply the fix I describe in my first reply.
Those characters "こんにちは - これは日本語であります" are the same as é ö î. They are form from multiple bytes. Most english characters are form with 1 bytes.
strlen count the number of bytes of a string.
mbstring count the number of characters of a string.
As long that you use english the ratio between characters and bytes of a string are equal. When you use those special characters the ratio change bytes > characters .
What the server receive is bytes not characters.
Yeah, i already applied your fix. It works just fine.
I kinda just changed the first if to comment out "mb_"
if ($headers['length'] > /*mb_*/strlen($this->applyMask($headers,$payload))) {
i'm not totally sure the purpose of the mb_ command here. I'm guessing they thought the heades length would be the character size perhaps.
Either way it is working fine.
I was glad to see someone figured it out.
I was totally gonna dump this class if it didn't support UTF-8, as it woulda been a pain for my to figure out where it was messing up. Infact I started trying to find out on my own by putting a lotta var_dumps in. Guess I didn't get deep enough.
I added so many features to my webchat in the last 2 days, that I now need to clean up my javascript for a while now.
Here on my modified version of server on verbose mode what I see when I send those character:
Listening on: 0.0.0.0:8080
Master socket: Resource id #6
array(1) {
["/echobot"]=>
object(echobot)#3 (0) {
}
}
RUNNING with libev method BACKEND : Epoll
Client #1 connected. Resource id #7
read callback
array(14) {
["get"]=>
string(8) "/echobot"
["http"]=>
float(1.1)
["host"]=>
string(16) "xxxxxxx.com:8080"
["connection"]=>
string(7) "Upgrade"
["pragma"]=>
string(8) "no-cache"
["cache-control"]=>
string(8) "no-cache"
["upgrade"]=>
string(9) "websocket"
["origin"]=>
string(7) "file://"
["sec-websocket-version"]=>
string(2) "13"
["user-agent"]=>
string(108) "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
["accept-encoding"]=>
string(19) "gzip, deflate, sdch"
["accept-language"]=>
string(35) "fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4"
["sec-websocket-key"]=>
string(24) "mAW6xrPlkhFsa1+nCO2A6g=="
["sec-websocket-extensions"]=>
string(44) "permessage-deflate; client_max_window_bits; "
}
array(1) {
["/echobot"]=>
object(echobot)#3 (0) {
}
}
<< read time 0.0018730163574219 -- 533 prps
mem total : 396296 diff : 6400
read callback
packet size : 57
frame #1 position : 0 msglen : 51 offset : 6 = framesize of 57
object(msg_data)#10 (3) {
["opcode"]=>
int(1)
["users"]=>
object(WebSocketUser)#7 (12) {
["socket"]=>
resource(8) of type (Socket)
["id"]=>
string(14) "u563db3e4702c9"
["watcher"]=>
NULL
["headers"]=>
array(2) {
["get"]=>
string(8) "/echobot"
["host"]=>
string(16) "xxxxxxx.com:8080"
}
["readystate"]=>
int(1)
["handlingPartialPacket"]=>
bool(false)
["readBuffer"]=>
string(0) ""
["writeNeeded"]=>
bool(false)
["writeBuffer"]=>
string(0) ""
["partialMessage"]=>
string(0) ""
["hasSentClose"]=>
bool(false)
}
["message"]=>
string(51) "ããã«ã¡ã¯ - ããã¯æ¥æ¬èªã§ããã¾ã"
}
######## PACKET END #########
<< read time 0.00049710273742676 -- 2011 prps
mem total : 398720 diff : 8824
Today I download this code. I try with the client.html and the testserver but when I send messages with special charachter, like "é, ő, ű" and so on, the $headers['length'] and the mb_strlen($this->applyMask($headers,$payload)) differs. The $headers['length'] shows the correct length, but the MB_STRLEN shows greater! and than its return false so the message doesn't recieved by the server :(