Closed p5pRT closed 8 years ago
This program:
perl -MUnicode::UCD=charscript -wle 'print charscript(chr(0x6237)) // "undef"'
should print "Han"\, but instead it prints "undef". The same behavior occurs on two different machines\, with 5.18.1 and 5.14.2.
The applicable line of the Unicode data file http://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt is:
4E00..9FCC ; Han # Lo [20941] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FCC
The problem has been pointed out to me: charscript\, despite its name\, wants a codepoint number\, not an actual character. This bug can be closed.
P.S.: Its it considered general knowledge that our bug tracker totally sucks? I'm told that this isn't because RT itself sucks\, but because nobody on our side bothered to configure it properly. If someone wanted to fix this\, I would be glad to put in thirty or forty minutes and come up with a long list of complaints.
On Sat Dec 14 08:14:20 2013\, mjd@plover.com wrote:
The problem has been pointed out to me: charscript\, despite its name\, wants a codepoint number\, not an actual character. This bug can be closed.
Closing per request from OP.
The RT System itself - Status changed from 'new' to 'open'
@jkeenan - Status changed from 'open' to 'rejected'
On 12/14/2013 09:13 AM\, Mark Dominus wrote:
The problem has been pointed out to me: charscript\, despite its name\, wants a codepoint number\, not an actual character. This bug can be closed.
Suppose charscript() and friends raised a warning if the code point argument passed to them is invalid\, instead of just returning undef (or the empty list as it currently does)? We could perhaps even suppress said warning unless the argument also had the utf8 flag set. That has the potential of breaking less code\, I think.
Karl Williamson wrote:
Suppose charscript() and friends raised a warning if the code point argument passed to them is invalid\,
Sounds good. Specifically\, you want a warning iff the argument would generate a warning if used in a numeric context. Because this is a numeric context.
suppress said warning unless the argument also had the utf8 flag set.
That does not sound like a good idea. "foo" is just as numerically invalid as "\x{2603}". If the user's passing a single character\, it'll sometimes be in the Latin-1 range\, in which case it could be represented either way. We want to make behaviour *less* dependent on the internal encoding of strings\, not more.
-zefram
Zefram \zefram@​fysh\.org:
Karl Williamson wrote:
Suppose charscript() and friends raised a warning if the code point argument passed to them is invalid\,
Sounds good. Specifically\, you want a warning iff the argument would generate a warning if used in a numeric context. Because this is a numeric context.
I'm not sure it makes sense to slow down every call to charscript() just to prevent what was actually an RTFM error.
On 12/15/2013 05:39 PM\, Mark Dominus wrote:
Zefram \zefram@​fysh\.org:
Karl Williamson wrote:
Suppose charscript() and friends raised a warning if the code point argument passed to them is invalid\,
Sounds good. Specifically\, you want a warning iff the argument would generate a warning if used in a numeric context. Because this is a numeric context.
I'm not sure it makes sense to slow down every call to charscript() just to prevent what was actually an RTFM error.
This would slow down only error cases. As you pointed out\, the name of the function is misleading. It seems to me that it would be a reasonable thing for us to do to help users cope with that. It would also save this list time by keeping unwarranted bug reports from being filed.
One thing to note\, though\, is the best place to put the warning is in a common function used by all the functions in the module to do code point argument processing\, so the warning would be raised for all such functions.
On 12/15/2013 10:25 PM\, Karl Williamson wrote:
On 12/15/2013 05:39 PM\, Mark Dominus wrote:
Zefram \zefram@​fysh\.org:
Karl Williamson wrote:
Suppose charscript() and friends raised a warning if the code point argument passed to them is invalid\,
Sounds good. Specifically\, you want a warning iff the argument would generate a warning if used in a numeric context. Because this is a numeric context.
I'm not sure it makes sense to slow down every call to charscript() just to prevent what was actually an RTFM error.
This would slow down only error cases. As you pointed out\, the name of the function is misleading. It seems to me that it would be a reasonable thing for us to do to help users cope with that. It would also save this list time by keeping unwarranted bug reports from being filed.
One thing to note\, though\, is the best place to put the warning is in a common function used by all the functions in the module to do code point argument processing\, so the warning would be raised for all such functions.
I looked at the cod of Unicode::UCD. It turns out that most of the functions in it croak when they get this type of illegal parameter. And all but two of the rest call carp. This means that the only two that are silent are charblock() and charscript().
And\, the context isn't numeric. The parameter for these two functions can be either a number\, or the name of a script or block. If it doesn't look like a number\, it assumes it is a name\, and if there is no such name\, it returns undef.
It is a trivial matter to add a warning here\, which would not add CPU time to the success cases. But I'd like to get more of a consensus as to if doing so is advisable
Based on the discussion\, I'm reopening this ticket to fix it instead of rejecting it -- Karl Williamson
@khwilliamson - Status changed from 'rejected' to 'open'
Fixed in bc37b130604215b78ec3e03d73b81cb08cfa741e
Thanks for reporting the problem
-- Karl Williamson
@khwilliamson - Status changed from 'open' to 'pending release'
Thank you for submitting this report. You have helped make Perl better.
With the release of Perl 5.24.0 on May 9\, 2016\, this and 149 other issues have been resolved.
Perl 5.24.0 may be downloaded via https://metacpan.org/release/RJBS/perl-5.24.0
@khwilliamson - Status changed from 'pending release' to 'resolved'
Migrated from rt.perl.org#120790 (status was 'resolved')
Searchable as RT120790$