PHP-Polyfills / mb_trim

PHP: Provides a user-land polyfill for `mb_trim`, `mb_ltrim`, and `mb_rtrim` functions added in PHP 8.4.
https://php.watch/versions/8.4/mb_trim-mb_ltrim-mb_rtrim
MIT License
4 stars 2 forks source link

Type of return value mb_trim() is incompatible with possible return values of preg_replace() #2

Open utilmind opened 1 month ago

utilmind commented 1 month ago

Hello! Here is the one issue I noticed...

If input string has non-UTF8 encoding (e.g. Windows-1251 aka cp1251) and no encoding is explicitly specified upon call of mb_trim(), preg_replace() will return NULL, thus it will trigger PHP Fatal error because “Return value must be of type string, null returned”.

Possible solutions are either, explicitly specify the (string) type for return value: – return (string)preg_replace("/^[\s\0]+|[\s\0]+$/uD", '', $string);, or – don't use type hints for the function parameters and return value.

I would prefer this way, so this polypill will be compatible with even earlier PHP versions, earlier than current PHP 8.1 support. (BTW, I still supporting legacy website written on PHP5 and interested using this polyfill there.)

P.S. I'm not really sure that this is a real issue. Maybe it's good that it triggers fatal error in case if input string has wrong encoding.

utilmind commented 1 month ago

Additionally I don't understand the purpose of the following heavy construction:

    $charMap = array_map(static fn(string $char): string => preg_quote($char, '/'), mb_str_split($characters));
    $regexClass = implode('', $charMap);
    $regex = '/^[' . $regexClass . ']+|[' . $regexClass . ']+$/uD';

Why not make it simple?

    $characters = preg_quote($characters, '/');
    $regex = '/^[' . $characters . ']+|[' . $characters . ']+$/uD';

I'm not preparing PR yet. Perhaps you have serious reasons for acting this way that are not obvious to me. Please let me know what do you think? These 2 code blocks producing equal result or not? BTW, shorter code is compatible with PHP5 PHP7.

UPD. I performed some tests and I don't see any difference between the original and improved block. The only difference that the first one escaping characters one by one, mine is doing this with entire string. But result is always the same, the goal, to literally escape each character of the string achieving in both ways. But mine is shorter and faster.

If you approve my another pull request, please let me know, I will prepare another one with this improvement.

utilmind commented 1 month ago

Here is a polyfill for PHP5. Not exact polyfill, PHP5 doesn't supports u{xxxx} characters, but it still works in most cases:

        if (PHP_VERSION_ID < 70000) {
            $chars = '\\s|\\xC2\\xA0'; // \\xC2\\xA0 is the same as [\t\x{00A0}]/u

            if ($characters = str_replace(UTF8_SPACES, '', $characters)) {
                $chars .= '|['.preg_quote($characters, '/').']';
            }

            return preg_replace("/^($chars)+|($chars)+$/", '', $str);
        }