erthink / libmdbx

One of the fastest embeddable key-value ACID database without WAL. libmdbx surpasses the legendary LMDB in terms of reliability, features and performance.
https://erthink.github.io/libmdbx/
Other
1.16k stars 111 forks source link

[mdbx::byte] better forgo char8_t and always use unsigned char #263

Open yperbasis opened 2 years ago

yperbasis commented 2 years ago

Currently mdbx::byte is defined as either char8_t or unsigned char. However, char8_t is much more narrow in scope by design and is intended only to help with UTF8 encoding: https://stackoverflow.com/a/57453713. I suggest to always define mdbx::byte as unsigned char.

P.S. In Silkworm we use uint8_t (= unsigned char) as our byte type; so making mdbx::byte always equal to unsigned char would allow us to use slice::byte_ptr() instead of slice::data() + cast.

erthink commented 2 years ago

This topic is somewhat confusing, but let me try to explain why libmdbx using char8_t as mdbx::byte.

Essentially I need just the unsigned char * restrict type in C99 terms, i.e. the non-aliasing pointer to unsigned char. However, C++ still doesn't have restrict keyword, but also a char-pointers may be aliased.

On the other hand, while/where uint8_t is defined/provided and CHAR_BIT = 8 the char8_t * actually act the same as the C99' unsigned char * restrict. Both the CHAR_BIT = 8 and uint8_t are prerequirements for libmdbx, so I don't expect any issues, including intentional injection of UB-errors by "extra strict" compilers.

At the same time, the approach of using char8_t has several advantages:

Nonetheless, I should think about switching to the uint8_t * __restrict__ for byte pointers.