algo26-matthias / idna-convert

Pure PHP IDNA Converter
http://idnaconv.net
GNU Lesser General Public License v2.1
66 stars 23 forks source link

Endless loops in `ToIdn::convert` #35

Closed sb-onoffice-de closed 1 year ago

sb-onoffice-de commented 1 year ago

Describe the bug I found two unicode points that lead to an endless loop in ToIdn::convert, see below.

To Reproduce (new \Algo26\IdnaConvert\ToIdn(2003))->convert("\u{37a}"); or (new \Algo26\IdnaConvert\ToIdn(2003))->convert("\u{33c7}");

Reproducible with v3.0.5 and dev-master. PHP version is PHP 8.1.12.

algo26-matthias commented 1 year ago

Hi @sb-onoffice-de,

thank you for taking the time to report the issue. I will look into it and get back to you ASAP.

Regards Matthias

algo26-matthias commented 1 year ago

I've found the reason for this behaviour. Not all ASCII characters were copied to the output (only letters, numbers and the hyphen), which is problematic for the examples you gave above. Those Unicode characters are expanded into strings, which themselves contain ASCII characters outside the aforementioned list of "allowed" characters. This behaviour is probably an oversight from the very beginning of the class - and never came to light, since probably those Unicode characters from your examples are not allowed in domain names anyway.

Our fix will now copy all ASCII characters regardless of them being legal for domain names or not, which fixes the issue. Potentially this might break use cases of a few people, so this will be a new minor version, v3.1.0.

The fixed version will be available soon.

sb-onoffice-de commented 1 year ago

Thank you!

Wouldn't it be possible to throw an Algo26\IdnaConvert\Exception\InvalidCharacterException?

I found those 2 code points with a script like this:

<?php

use Algo26\IdnaConvert\Exception\InvalidCharacterException;

__di(PhpRuntime::class)->header('Content-Type: text/plain; charset=utf-8');
__di(PhpRuntime::class)->set_time_limit(0);

$skip = [0x37a, 0x33c7];
$invalid = [];
$utf8Tools = __di(Utf8Tools::class);
$idnaConverter = __di(IdnaConvertWrapper::class);

for ($i = 0x80; $i <= 0x10000; $i++) {
    if (in_array($i, $skip, true)) {
        continue;
    }

    try {
        $idnaConverter->encode($utf8Tools->encodePoint($i));
    } catch (InvalidCharacterException $exception) {
        $invalid [] = $i;

        echo '[FAIL] '.dechex($i).' ('.$utf8Tools->encodePoint($i).')'.LF;
    }
}
sb-onoffice-de commented 1 year ago

Of course, the __di(PhpRuntime::class) stuff is specific to our code base, but I think you get the idea

algo26-matthias commented 1 year ago

There's actually no need for an exception, because the behaviour as of 3.1.0 is as per the RFCs. What we did back in 2003 worked well for the intended original use case. But given the wider context of Punycode itself, I'm confident that the fix actually improves the implementation.

BTW.: Would you mind turning your little script into some meaningful test case? That would help the project a lot!

sb-onoffice-de commented 1 year ago

OK, great, thank you!

As for the test case: I'll try to find time for it.

Thanks again! :-)