Fix for JSIP-363: read bytes in UTF8 from input buffer

RestComm / jain-sip

Disclaimer: This repository is a git-svn mirror of the project found at http://java.net/projects/jsip whose original repository is developed collaboratively by the Advanced Networking Technologies Division at the National Institute of Standards and Technology (NIST) - an agency of the United States Department of Commerce and by a community of individual and enterprise contributors. TeleStax, Inc. will perform some productization work, new features experimentation branches, etc for its TelScale jSIP product that doesn't concern the community from the main repository hence this git repository.

http://www.restcomm.com/

144 stars 152 forks source link

Fix for JSIP-363: read bytes in UTF8 from input buffer #108

Closed fre42 closed 8 years ago

fre42 commented 8 years ago

UTF8 characters are not handled correctly when receiving a SIP message via TCP or TLS under MS Windows 7. The problem is caused by the code in PipelinedMsgParser.java in the run() method near line 462. The bytes are read from buffer inputBuffer.toString().getBytes() without specifying the charset which needs to be UTF8. This does only work correctly when your local machine is configured to use UTF-8 charset by default. For MS Windows machines this is not the default.

I would suggest to change the call to inputBuffer.toString().getBytes("UTF-8") and do some error handling for the (strange) case that there is no UTF-8 charset available.

The issue was first reported in https://java.net/jira/browse/JSIP-363. In usnistgov/jsip it has been fixed in a similar way.

fre42 commented 8 years ago

Yes, OK. When UTF-8 is really missing on a misconfigured system then the fallback code after catch would work. But in this case you would see the WARN message for every received packet. That's the reason why I was not sure to place logging here.

jaimecasero commented 8 years ago

@fre42 printing a WARN message for every packet is heavy, but a system running with incorrect encoding configuration is quite unpredictable anyway. If the option is to silently go ahead, then I prefer printing everytime.

Another solution would be to test for encoding on stack creation and print the warning there. Would you like to contribute that along with this PR?

fre42 commented 8 years ago

I've added some code on stack creation which tests if "UTF-8" is available and which logs a WARN if not.