gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
304 stars 44 forks source link

test U_CHARSET_IS_UTF8 in configure when using pkg-build #314

Closed gagolews closed 6 years ago

gagolews commented 6 years ago

Seems like Manjaro's (might be the case of other distribs too) system icu was build with U_CHARSET_IS_UTF8 on

./configure should check that and, if this is the case, and build icu from sources

gagolews commented 6 years ago

building icu from sources is not enought if libR is linked against system icu anyway, should enable renaming?

gagolews commented 6 years ago

set U_LIB_SUFFIX_C_NAME??

QuLogic commented 6 years ago

Does this break something in particular? The change is not specific to Manjaro; ICU itself has flipped the flag for all Linux systems with version 61: https://github.com/unicode-org/icu/commit/d7482c9720b4f71dd9dad030ec7a1c10bf1ccec2

gagolews commented 6 years ago

With U_CHARSET_IS_UTF8, stri_enc_set() does not work and thus makes stringi-based scripts not portable across other platforms. The value of stri_enc_get() affects how stringi interprets the R Unknown encoding.