Fix protocol mis-read with utf8 (issue #8) - Githubissues

englercj / node-esl

FreeSWITCH ESL implementation for Node.js; implements the full Event Socket Library specified in: http://wiki.freeswitch.org/wiki/Esl

http://englercj.github.com/node-esl/

MIT License

171 stars 109 forks source link

Fix protocol mis-read with utf8 (issue #8) #12

Closed cxreg closed 11 years ago

cxreg commented 11 years ago

This addresses the issue I observed where multi-byte utf8 data in the JSON payload would cause the parser to get into a bad state

By reading the stream as utf8, but then splitting it using byte-lengths via Buffer, we keep the data as intended but read the correct length off the wire, and avoid short or long reads

cxreg commented 11 years ago

Chad, any thoughts or questions about this patch?

englercj commented 11 years ago

Out of town at the moment, will look at it sometime next week

englercj commented 11 years ago

only things I want to say is maybe use the static Buffer.bytelength() method for the conditional on line 79, then only create an actual buffer instance afterwards. also use new Buffer instead of just Buffer in line 149. finally, add a test case for this and I'll merge it in . thanks!!

cxreg commented 11 years ago

I looked into making a test case but I didn't understand the test framework you have in place, and when I added a header with the multi-byte data in it, the test didn't fail.

I guess I'll take another look at it

englercj commented 11 years ago

I will publish this version to npm when I get home tonight

cxreg commented 11 years ago

On further discovery, this patch worked great for valid utf-8 characters, but invalid ones still caused a parser error. I received a call that had a 0x90 in the caller-id and it resulted in a short read. The reason is that the socket.setEncoding actually modifies the data, and in this case it added bytes (I believe it utf-8 encoded the funky character). So after doing so, the Content-Length was "too short" and left data in the buffer

I have a new patch that takes a different approach, and which safely handles both cases. It still ends up with a corrupted/broken caller-id, but at least the ESL parser works for all of ascii/utf8/whatever. This new version uses a Buffer all the way up to the body data extraction before then calling toString

englercj commented 11 years ago

@cxreg Is this a bug in freeswitch that should be reported?

cxreg commented 11 years ago

I'm not so sure. This is the actual data that arrives over SIP. Freeswitch is simply passing it as-is in the event headers. I guess maybe it could identify bytes that are not valid utf8 and send them as '\xXX' strings..

englercj commented 11 years ago

@cxreg I see what you are saying, not much you can do about that. SIP is supposed to be UTF8 (see RFC3261), but if it isn't there isn't much freeswitch can do about it.

cxreg commented 11 years ago

I started a conversation in #freeswitch and it seems reasonable that it should be emitting these as '\xXX' or \u00XX' strings, but until it does so (if ever) it would be nice if this library didn't crash when it sends the data

Should I push my patch?

englercj commented 11 years ago

For sure, open a new PR for it and lets at least look at it