ghedipunk / PHP-Websockets

A Websockets server written in PHP.
BSD 3-Clause "New" or "Revised" License
913 stars 375 forks source link

MB_STRLEN and headers['length'] #53

Closed lawrencedesign closed 8 years ago

lawrencedesign commented 8 years ago

Today I download this code. I try with the client.html and the testserver but when I send messages with special charachter, like "é, ő, ű" and so on, the $headers['length'] and the mb_strlen($this->applyMask($headers,$payload)) differs. The $headers['length'] shows the correct length, but the MB_STRLEN shows greater! and than its return false so the message doesn't recieved by the server :(

Xaraknid commented 8 years ago

First make sure the core php and mbstring use the right encoding :

for mbstring use : echo mb_internal_encoding(); for core php check : default_charset ( php >=5.6 use UTF-8 by default )

Make sure you have UTF-8.

It's strange mb should be lower considering those character use more than one byte and mb_strlen return the number of character as strlen return bytes.

strlen is more suit for the job using with socket_write as you send bytes not character.

Second if first isn't enough: As the mbstring is only use in one place I suggest you that you replace

    if (extension_loaded('mbstring')) {
      if ($headers['length'] > mb_strlen($this->applyMask($headers,$payload))) {
        $user->handlingPartialPacket = true;
        $user->partialBuffer = $message;
        return false;
      }
    } 
    else {
      if ($headers['length'] > strlen($this->applyMask($headers,$payload))) {
        $user->handlingPartialPacket = true;
        $user->partialBuffer = $message;
        return false;
      }
    }

by

     if ($headers['length'] > strlen($this->applyMask($headers,$payload))) {
        $user->handlingPartialPacket = true;
        $user->partialBuffer = $message;
        return false;
     }

while waiting for an official fix by @ghedipunk

lawrencedesign commented 8 years ago

Hello!

I'm using the right encoding, setted with mb_internal_encoding();

I have UTF-8 and my files is UTF-8 too.

I tried with strlen. It works, but its strange.

On 18 September 2015 at 07:03, Xaraknid notifications@github.com wrote:

First make sure the core php and mbstring use the right encoding :

for mbstring use : echo mb_internal_encoding(); for core php check : default_charset ( php >=5.6 use UTF-8 by default )

Make sure you have UTF-8.

It's strange mb should be lower considering those character use more than one byte and mb_strlen return the number of character as strlen return bytes.

strlen is more suit for the job using with socket_write as you send bytes not character.

As the mbstring is only use in one place I suggest you that you replace

if (extension_loaded('mbstring')) {
  if ($headers['length'] > mb_strlen($this->applyMask($headers,$payload))) {
    $user->handlingPartialPacket = true;
    $user->partialBuffer = $message;
    return false;
  }
}
else {
  if ($headers['length'] > strlen($this->applyMask($headers,$payload))) {
    $user->handlingPartialPacket = true;
    $user->partialBuffer = $message;
    return false;
  }
}

by

 if ($headers['length'] > strlen($this->applyMask($headers,$payload))) {
    $user->handlingPartialPacket = true;
    $user->partialBuffer = $message;
    return false;
 }

while waiting for an official fix by @ghedipunk https://github.com/ghedipunk

— Reply to this email directly or view it on GitHub https://github.com/ghedipunk/PHP-Websockets/issues/53#issuecomment-141345705 .

drone540 commented 8 years ago

Oh, I was wondering why this only supported English and no other languages. I started building a chat last week and was going to actually code the server portion myself till I realized it wasn't as simple as I originally thought as I don't know much about handling binary data, so I found this PHP-Websockets Class.

One of the first thing I noticed after setting up the javascript, was that it only was receiving ASCII text. At first I thought it was the javascript portion and that I mighta needed to encode something, but the javascript appeared to be sending the messages to the server just fine.

So I tried the original tutorial about programming websocket servers (there is a download link at the bottom - http://www.sanwebe.com/2013/05/chat-using-websocket-php-socket) and it's basic server worked just fine.

Looking back at the tutorial code, it looks so simple I would almost prefer to use that instead. But I doubt it properly implements the whole protocol like this script does.

Anyways here is a log of what's happening, so that hopefully this script can be fixed.

Chatlog

00:23:36 - People in chat:
00:23:36 - Guest_89395 has connected
00:23:36 - You are now the owner of the room
00:23:55 - Guest_89395: Hello - This is in English
00:24:08 - Disconnected

After my First Message - I also sent but my chat only shows received messages even if you sent it.

こんにちは - これは日本語であります

Server Log

Server started
Listening on: 0.0.0.0:9000
Master socket: Resource id #6
Client connected. Resource id #7

$headers["length"]
int(39)

mb_strlen($this->applyMask($headers,$payload))
int(39)

strlen($this->applyMask($headers,$payload))
int(39)

mb_internal_encoding()
string(5) "UTF-8"

ini_get("default_charset")
string(5) "UTF-8"

$headers
array(8) {
  ["fin"]=>
  string(1) "?"
  ["rsv1"]=>
  string(1) "\u0000"
  ["rsv2"]=>
  string(1) "\u0000"
  ["rsv3"]=>
  string(1) "\u0000"
  ["opcode"]=>
  int(1)
  ["hasmask"]=>
  string(1) "?"
  ["length"]=>
  int(39)
  ["mask"]=>
  string(4) "X+1^"
}

$this->applyMask($headers,payload)
string(39) "{"cmd":"nick","username":"Guest_89395"}"

Received: 
object(stdClass)#3 (2) {
  ["cmd"]=>
  string(4) "nick"
  ["username"]=>
  string(11) "Guest_89395"
}

Sending:
string(39) "{"cmd":"nick","username":"Guest_89395"}"

$headers["length"]
int(49)

mb_strlen($this->applyMask($headers,$payload))
int(49)

strlen($this->applyMask($headers,$payload))
int(49)

mb_internal_encoding()
string(5) "UTF-8"

ini_get("default_charset")
string(5) "UTF-8"

$headers
array(8) {
  ["fin"]=>
  string(1) "?"
  ["rsv1"]=>
  string(1) "\u0000"
  ["rsv2"]=>
  string(1) "\u0000"
  ["rsv3"]=>
  string(1) "\u0000"
  ["opcode"]=>
  int(1)
  ["hasmask"]=>
  string(1) "?"
  ["length"]=>
  int(49)
  ["mask"]=>
  string(4) "?q??"
}

$this->applyMask($headers,payload)
string(49) "{"cmd":"say","text":"Hello - This is in English"}"

Received:
object(stdClass)#3 (2) {
  ["cmd"]=>
  string(3) "say"
  ["text"]=>
  string(26) "Hello - This is in English"
}

Sending:
string(70) "{"cmd":"say","text":"Hello - This is in English","from":"Guest_89395"}"

$headers["length"]
int(74)

mb_strlen($this->applyMask($headers,$payload))
int(42)

strlen($this->applyMask($headers,$payload))
int(74)

mb_internal_encoding()
string(5) "UTF-8"

ini_get("default_charset")
string(5) "UTF-8"

$headers
array(8) {
  ["fin"]=>
  string(1) "?"
  ["rsv1"]=>
  string(1) "\u0000"
  ["rsv2"]=>
  string(1) "\u0000"
  ["rsv3"]=>
  string(1) "\u0000"
  ["opcode"]=>
  int(1)
  ["hasmask"]=>
  string(1) "?"
  ["length"]=>
  int(74)
  ["mask"]=>
  string(4) "\u001D?f\u0014"
}

$this->applyMask($headers,payload)
string(74) "{"cmd":"say","text":"こんにちは - これは日本語であります"}"

I replaced unprintable characters with unicode escapes.

Note: I have some other various other debugging info mixed in from my class that extends WebSocketServer. Which is just a var_dump(json_decode($message)) for function process. And the json_encoded($message) being sent back out. Since it looks like the code thinks the japanese text is a partial message, it does not even get received by the server yet. So it never calls process().

Xaraknid commented 8 years ago

Actualy apply the fix I describe in my first reply.

Those characters "こんにちは - これは日本語であります" are the same as é ö î. They are form from multiple bytes. Most english characters are form with 1 bytes.

strlen count the number of bytes of a string.

mbstring count the number of characters of a string.

As long that you use english the ratio between characters and bytes of a string are equal. When you use those special characters the ratio change bytes > characters .

What the server receive is bytes not characters.

drone540 commented 8 years ago

Yeah, i already applied your fix. It works just fine.

I kinda just changed the first if to comment out "mb_"

if ($headers['length'] > /*mb_*/strlen($this->applyMask($headers,$payload))) {

i'm not totally sure the purpose of the mb_ command here. I'm guessing they thought the heades length would be the character size perhaps.

Either way it is working fine.

I was glad to see someone figured it out.

I was totally gonna dump this class if it didn't support UTF-8, as it woulda been a pain for my to figure out where it was messing up. Infact I started trying to find out on my own by putting a lotta var_dumps in. Guess I didn't get deep enough.

I added so many features to my webchat in the last 2 days, that I now need to clean up my javascript for a while now.

Xaraknid commented 8 years ago

Here on my modified version of server on verbose mode what I see when I send those character:

Listening on: 0.0.0.0:8080
Master socket: Resource id #6
array(1) {
  ["/echobot"]=>
  object(echobot)#3 (0) {
  }
}
RUNNING with libev method BACKEND : Epoll
Client #1 connected. Resource id #7
read callback
array(14) {
  ["get"]=>
  string(8) "/echobot"
  ["http"]=>
  float(1.1)
  ["host"]=>
  string(16) "xxxxxxx.com:8080"
  ["connection"]=>
  string(7) "Upgrade"
  ["pragma"]=>
  string(8) "no-cache"
  ["cache-control"]=>
  string(8) "no-cache"
  ["upgrade"]=>
  string(9) "websocket"
  ["origin"]=>
  string(7) "file://"
  ["sec-websocket-version"]=>
  string(2) "13"
  ["user-agent"]=>
  string(108) "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
  ["accept-encoding"]=>
  string(19) "gzip, deflate, sdch"
  ["accept-language"]=>
  string(35) "fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4"
  ["sec-websocket-key"]=>
  string(24) "mAW6xrPlkhFsa1+nCO2A6g=="
  ["sec-websocket-extensions"]=>
  string(44) "permessage-deflate; client_max_window_bits; "
}
array(1) {
  ["/echobot"]=>
  object(echobot)#3 (0) {
  }
}
<< read time 0.0018730163574219 -- 533 prps
mem total : 396296 diff : 6400
read callback
packet size : 57 
frame #1 position : 0 msglen : 51 offset : 6 = framesize of 57
object(msg_data)#10 (3) {
  ["opcode"]=>
  int(1)
  ["users"]=>
  object(WebSocketUser)#7 (12) {
    ["socket"]=>
    resource(8) of type (Socket)
    ["id"]=>
    string(14) "u563db3e4702c9"
    ["watcher"]=>
    NULL
    ["headers"]=>
    array(2) {
      ["get"]=>
      string(8) "/echobot"
      ["host"]=>
      string(16) "xxxxxxx.com:8080"
    }
    ["readystate"]=>
    int(1)
    ["handlingPartialPacket"]=>
    bool(false)
    ["readBuffer"]=>
    string(0) ""
    ["writeNeeded"]=>
    bool(false)
    ["writeBuffer"]=>
    string(0) ""
    ["partialMessage"]=>
    string(0) ""
    ["hasSentClose"]=>
    bool(false)
  }
  ["message"]=>
  string(51) "ããã«ã¡ã¯ - ããã¯æ¥æ¬èªã§ããã¾ã"
}
########    PACKET END         #########
<< read time 0.00049710273742676 -- 2011 prps
mem total : 398720 diff : 8824