alandefreitas / matplotplusplus

Matplot++: A C++ Graphics Library for Data Visualization 📊🗾
https://alandefreitas.github.io/matplotplusplus/
MIT License
4.1k stars 311 forks source link

Add C++20 compatibility #355

Closed j0hnnybash closed 1 year ago

j0hnnybash commented 1 year ago

Beginning with C++20 u8"" literals use char8_t as underlying types making them incompatible with char based std::string. Adding reinterpret_casts to char is among the remediation approaches suggested by 1. There are only 4 uses of u8 literals in the codebase, also this is the simplest backwards compatible solution to this issue (the alternative would be to use ASCII escape codes in regular string literals). Long term it could make sense to replace uses of std::string with std::u8string.

This fixes #287.

alandefreitas commented 1 year ago

Can't we just remove u8 from those literals? We represent them with binary literals if we have to. The impression I have is reinterpret_cast is inappropriate or even UB here because char8_t is always unsigned. So it's either static_cast or UB. We should have a proper matrix here testing all compilers and standards with at least one factor setting -funsigned for GCC to make sure it works. What's ironic is I have implemented the tools just for that (https://github.com/alandefreitas/cpp-actions) but haven't had the time to implement it. Here's (https://github.com/boostorg/url/blob/develop/.github/workflows/ci.yml) a repository where I did integrate such a matrix.

j0hnnybash commented 1 year ago

Yes you are correct, i missed that just using non u8 literals works because it is save to assume the execution and source encoding will be UTF-8 (at least the latter was already assumed by using the character itself and not a unicode codepoint escape sequence).

Side note: as per https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r2.html the reinterpret cast isn't UB because char (unsigned or not) can always be used to access any other type (https://eel.is/c++draft/expr#basic.lval-11).

Side note2: these u8 literals are really unintuitive...

alandefreitas commented 1 year ago

Side note: as per https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r2.html the reinterpret cast isn't UB because char (unsigned or not) can always be used to access any other type (https://eel.is/c++draft/expr#basic.lval-11).

Yes. You're right. It's not UB because char and unsigned char could be representing any byte sequence. I think it's still "improper" (UB is not the correct term) to reinterpret_cast though. If there's a divergence between the types being used, which is happen in most cases as char8_t is always unsigned and char is typically signed, then the reinterpreted output might represent the wrong unicode codepoint.

Side note2: these u8 literals are really unintuitive

Yes. It sucks. I had problems with it in other libraries too. The signed/unsigned relationship is problematic. And char without char8_t is also problematic because char is typically signed while unicode assumes chars are unsigned.

j0hnnybash commented 1 year ago

FYI: I updated the commit to simply remove the u8 prefix on the literals, instead of using reinterpret_casts.

alandefreitas commented 1 year ago

Yes. That works. Thanks!