benkasminbullock / Lingua-JA-Moji

Handle many kinds of Japanese characters
https://metacpan.org/release/Lingua-JA-Moji
2 stars 4 forks source link

Bleadperl v5.29.8-151-g765e6ecf32 breaks BKB/Lingua-JA-Moji-0.56.tar.gz #24

Closed andk closed 5 years ago

andk commented 5 years ago

As per subject. Link to bleadperl commit: https://perl5.git.perl.org/perl.git/commit/765e6ecf32

Sample fail report: http://www.cpantesters.org/cpan/report/e0548520-4eb9-11e9-b882-34ceec8ecb8d

I will also open an issue on perlbug and post the link here.

andk commented 5 years ago

Here is a link to the perlbug issue. I hope it helps: https://rt.perl.org/Ticket/Display.html?id=133968

Regards,

benkasminbullock commented 5 years ago

I'm fairly confident that this is a bug in Perl, not in my module. There is a blog post here:

http://blogs.perl.org/users/ben_bullock/2019/03/what-to-do-with-doubly-broken-utf-8.html

The thing which Perl 5.29 gets as the first match is "ック" but the regex here:

https://metacpan.org/source/BKB/Lingua-JA-Moji-0.56/lib/Lingua/JA/Moji.pm#L1404

should match the initial "ソー". Without carefully inspecting the code, the most likely suspect here is probably the "オ-モ" in the regex, which should match all the Unicode characters from オ to モ. Maybe KW forgot to include that in his thing.

khwilliamson commented 5 years ago

This bug was already discovered by Slaven Rezic, and reported here https://rt.perl.org/Ticket/Display.html?id=133942#txn-1621522 And it has already been fixed in blead, as further commentary in that ticket indicate.

Andreas, there have been several fixes to blead recently. It would be best to test with that instead of an older version.

benkasminbullock commented 5 years ago

I don't think the 0.56 test failures are a bug in my module, so I'll mark this issue as closed. If it's a bug in LJM, please reopen.