boostorg / spirit

Boost.org spirit module
http://boost.org/libs/spirit
394 stars 162 forks source link

make x3 interger parser less dependent on fundamental character types #762

Open wanghan02 opened 1 year ago

wanghan02 commented 1 year ago

Many of the spirit x3 parsers are natualy able to work with user defined char types. It would be very helpful if x3's integer parser could be one of those parsers. And it could be done by more canonical implementation of radix_traits::is_valid and radix_traits::digit.

In radix_traits::is_valid, input character ch of type Char is compared with several ascii characters of type char. The static_cast destination type should still be ascii character type char, not the input character type Char. Comparing input character ch with ascii type char should be the more canonical way to do the range check just like the comparison between ch with other ascii char literals ('0', '9', 'a', 'A'). If we have a user defined character type which doesn't support implicit conversion to/from the promoted integer type, the range check still works!

In the old implementation of radix_traits::digit, the Radix check is not necessary and incomplete.

            return (Radix <= 10 || (ch >= '0' && ch <= '9'))
                ? ch - '0'
                : char_encoding::ascii::tolower(ch) - 'a' + 10;

Since radix_traits::digit is only called after the range check radix_traits::is_valid, we don't have to do any range check at all in radix_traits::digit. The input character ch is guaranteed to be in one of the 3 ranges: '0'-'9', 'a' - 'a' + Radix -10 -1 and 'A' - 'A' + Radix -10 -1. We also don't have to call char_encoding::ascii::tolower(ch) which adds another layer of unnecessary dependency. Given the valid range is right above, a simpler implementation as below is clear and efficient. And it doesn't require implicit conversion to integer type from input character type as well!

            return ch < 'A'
                ? ch - '0'
                : ch < 'a'
                    ? ch - 'A' + 10
                    : ch - 'a' + 10;

Both old and new implementations assume char literal uses ascii table.