Vonage / vonage-php-sdk-core

Vonage REST API client for PHP. API support for SMS, Voice, Text-to-Speech, Numbers, Verify (2FA) and more.
https://developer.vonage.com/
Apache License 2.0
916 stars 180 forks source link

SMS::isGsm7() incorrectly detects "ç" (U+00E7) as GSM7 #395

Closed Starfox64 closed 1 year ago

Starfox64 commented 1 year ago

Expected Behavior

A message containing the character Latin Small Letter C with Cedilla (U+00E7) should not be detected as being GSM7 compilent.

Current Behavior

SMS::isGsm7() returns true when Latin Small Letter C with Cedilla (U+00E7) is present.

Possible Solution

I've fixed a similar issue last year #318 but it looks like the new regex uses the correct character (capital instead of lowercase) so my guess is that there is something wrong with the regex itself. My suggestion would be to replace the regex implementation with the EncodingDetector logic from 3.x.

Steps to Reproduce (for bugs)

$message = new Vonage\SMS\Message\SMS('+33600000000', 'TEST', 'Hello there ça va ?');
Vonage\SMS\Message\SMS::isGsm7($message->getMessage()); // Returns true despite ç being present

Context

We are migrating to the 4.x release where the Message API was removed, since the AutoDetect constructor is missing, we've tried to replace it with this line after creating the SMS:

$message->setType(Vonage\SMS\Message\SMS::isGsm7($message->getMessage()) ? 'text' : 'unicode');

The results are wrong however.

Your Environment

SecondeJK commented 1 year ago

This will be because of mb_string functions and PHP generally being awful with Unicode. I'll look at a workaround for this - the original encoding detector was binned because of edge cases such as this

SecondeJK commented 1 year ago

Fixed by #403 merge, pending release.