Closed sb-onoffice-de closed 1 year ago
Hi @sb-onoffice-de,
thank you for taking the time to report the issue. I will look into it and get back to you ASAP.
Regards Matthias
I've found the reason for this behaviour. Not all ASCII characters were copied to the output (only letters, numbers and the hyphen), which is problematic for the examples you gave above. Those Unicode characters are expanded into strings, which themselves contain ASCII characters outside the aforementioned list of "allowed" characters. This behaviour is probably an oversight from the very beginning of the class - and never came to light, since probably those Unicode characters from your examples are not allowed in domain names anyway.
Our fix will now copy all ASCII characters regardless of them being legal for domain names or not, which fixes the issue. Potentially this might break use cases of a few people, so this will be a new minor version, v3.1.0.
The fixed version will be available soon.
Thank you!
Wouldn't it be possible to throw an Algo26\IdnaConvert\Exception\InvalidCharacterException?
I found those 2 code points with a script like this:
<?php
use Algo26\IdnaConvert\Exception\InvalidCharacterException;
__di(PhpRuntime::class)->header('Content-Type: text/plain; charset=utf-8');
__di(PhpRuntime::class)->set_time_limit(0);
$skip = [0x37a, 0x33c7];
$invalid = [];
$utf8Tools = __di(Utf8Tools::class);
$idnaConverter = __di(IdnaConvertWrapper::class);
for ($i = 0x80; $i <= 0x10000; $i++) {
if (in_array($i, $skip, true)) {
continue;
}
try {
$idnaConverter->encode($utf8Tools->encodePoint($i));
} catch (InvalidCharacterException $exception) {
$invalid [] = $i;
echo '[FAIL] '.dechex($i).' ('.$utf8Tools->encodePoint($i).')'.LF;
}
}
Of course, the __di(PhpRuntime::class)
stuff is specific to our code base, but I think you get the idea
There's actually no need for an exception, because the behaviour as of 3.1.0 is as per the RFCs. What we did back in 2003 worked well for the intended original use case. But given the wider context of Punycode itself, I'm confident that the fix actually improves the implementation.
BTW.: Would you mind turning your little script into some meaningful test case? That would help the project a lot!
OK, great, thank you!
As for the test case: I'll try to find time for it.
Thanks again! :-)
Describe the bug I found two unicode points that lead to an endless loop in
ToIdn::convert
, see below.To Reproduce
(new \Algo26\IdnaConvert\ToIdn(2003))->convert("\u{37a}");
or(new \Algo26\IdnaConvert\ToIdn(2003))->convert("\u{33c7}");
Reproducible with v3.0.5 and dev-master. PHP version is PHP 8.1.12.