alx-tools / Betty

Holberton-style C code checker written in Perl
GNU General Public License v3.0
1.26k stars 1.77k forks source link

Unicode characters count as 2 characters. #34

Open 0100-0100 opened 2 years ago

0100-0100 commented 2 years ago

When using Unicode characters they count as 2 instead of one.

Researching about Unicode support in Perl, believe this is happening due to Perl counting a Unicode character as 2 bytes depending on the character size in HEX digits.

─────────────── How to reproduce the problem ───────────────

Here's a sample C comment 81 characters long and the same length of comment using one Unicode character below.

ASCII comment: / Line # 14 A long commentary of exactly 81 characters. / Unicode comment. / Line # 16, with a Unicode character ----> ─ <---- /

──────────────────── Sample image ────────────────────

Here's an attached image showing the output given by betty.

image

──────────────────── Where to look ────────────────────

By reading the Perl code, I believe the problem might be arising around the line 2813 on the conditional statemet to raise the warning message indicated here: https://github.com/holbertonschool/Betty/blob/438f97cb63fa6ee8d6a8092a4f2fb529e238d1c9/betty-style.pl#L2813