Closed Flamefire closed 1 year ago
Lets start
utf8_none
- basically std backend does nothing only support C locale and ASCII code - uselessutf8_native
- basically std backend works and supports proper UTF-8 functionality. But not wide character support.utf8_native_with_wide
- means it supports UTF-8 but you can use some wide character functionality for better implementation like Linux gnu C library that supports UTF-8 locales (to certain extent) but can workaround some core issues using wide localeutf8_native_from_wide
- means there is support of unicode locale but for wide only (MSVC classic case) so you can't use char
locales but you can derive all your ops from wchar_t
Now selection of wide data or ordinary character is done as following:
from_wide
(on Windows) specified std::collate is usedI must admit that virtually 1/2 of std backend designed to handle various issues and incompatibilities between implementations - but that is what Boost.Locale is designed for.
utf8_native
seems to be unused, it is checked for but never set, so it can be removed, can't it?
utf8_native
is not in use because all standard libraries that support locales support wide characters as well at this point.
Technically you can remove it, but it may cover in some problematic cases when wide character is not properly supported by std::locale
on some systems in future.
So you can mark it "for future use" or remove it. Question is what would happen if some day it is going to be needed.
Any reason not to always use the
utf8_codecvt
given that it should work for UTF-16 and UTF-32 wchar_ts?
It is an option indeed, since native codecvt should provide same functionality when supported but utf8_codecvt
as well.
Same for
utf8_collator_from_wide
Collator from wide is used only on Windows MSVC when no utf-8 locale is supported. Native narrow character utf-8 locale is much more efficient and usually works well.
In which case would
time_put_from_base
/std::time_put_byname
fail, thatutf8_time_put_from_wide
avoids
Basically if std char locale supports time put - it is much more efficient than from wide that is why char
is preferred and from what I recall there were no failures,because time put much better designed in comparison to num/money punct.
Separators in numeric cases are defined by char
rather than string while for time formatting there are naturally strings that define months and week days - so it is preferred to use native ones.
Example case when numput from wide is better is NBSP character that I can identify and substitute with space,
Due to a recent issue related to the different handling of facet creation depending on
utf8_native_with_wide
etc. (now the enum classutf8_support
) I wanted to ask for clarification:utf8_native
seems to be unused, it is checked for but never set, so it can be removed, can't it?What is the intended difference between
utf8_native_with_wide
andutf8_from_wide
?converter
: https://github.com/boostorg/locale/blob/boost-1.48.0/src/std/converter.cpp#LL113C27-L113C73numeric.cpp
from_wide
usesutf8_time_put_from_wide
instead oftime_put_from_base
where the latter looks like it could have beenstd::time_put_byname
similar to many other facetscodecvt.cpp
from_wide
creates theutf8_codecvt
: https://github.com/boostorg/locale/blob/boost-1.48.0/src/std/codecvt.cpp#L30-L32 / https://github.com/boostorg/locale/blame/boost-1.60.0/src/std/codecvt.cpp#L31collator.cpp
from_wide
creates theutf8_collator_from_wide
: https://github.com/boostorg/locale/blob/boost-1.48.0/src/std/collate.cpp#L77-L81Current logic seems to be that on Windows
utf8_from_wide
is used otherwiseutf8_native_with_wide
is used and when the requested encoding isn't UTF-8 thenutf8_none
is used on all platforms.However I'm confused that in
numeric.cpp
theutf8_*_from_wide
classes are used (except fortime_put
) while the collator uses the*from_wide
variant only on Windows.To me it looks like either the
utf8_*_from_wide
classes should always be used (which might be a performance issue due to the required 2 conversions) or the standard classes are enough already.So questions (assuming an UTF-8 locale is requested):
utf8_codecvt
given that it should work for UTF-16 and UTF-32wchar_t
s?time_put_from_base/std::time_put_byname
fail, thatutf8_time_put_from_wide
avoids?utf8_collator_from_wide
vsstd::collate_byname
And (generally): Can
time_put_from_base
be replaced bystd::time_put_byname
?Only possible reasoning I can see is:
std::locale("foo.UTF-8")
fails butstd::locale("foo")
orstd::locale("Windows-name-of-foo")
works, i.e. the standard library does not support the UTF-8 encoding and that has to be emulated.Is this correct? In that case we would need
utf8_support::none
(non-UTF-8 locale requested),utf8_support::native
andutf8_support::from_wide
for whenstd::locale("foo.UTF-8")
works and when it doesn't respectively