boostorg / locale

Boost.Locale
Boost Software License 1.0
31 stars 70 forks source link

locale::conv::from_utf freezing on macOS14 and iOS17 because iconv library #206

Closed daniboybye closed 6 months ago

daniboybye commented 9 months ago
Flamefire commented 9 months ago

This looks similar to #196 as it happens with IConv on macOS 14. As I don't have access to a Mac I can't test it there to see what the issue might be. It looks like Apple has changed the iconv library on macOS 14 to be incompatible with the GNU version, see e.g. https://developer.apple.com/forums/thread/739533

I'm unable to find a documentation for libiconv on macOS 14 where the behavior is specified such that it can be compare with the other iconv implementations such as for GNU

So I'd need your help here:

On GNU libiconv for your input we have in_left=3 before, in_left=0 after the conv call and res=0 & output_count=7. What are the values on your system?

A workaround would be to not use iconv but ICU by disabling iconv which requires a change to the B2 build file: #207

daniboybye commented 9 months ago

In my systems: on first pass: in_left=1 before, in_left=0 after, res=0, output_count=1 on second pass: in_left=0 before, in_left=0 after, res=0, output_count=0 on third pass: in_left=3 before, in_left=3 after, res=-1, output_count=0 and we can't read more after that position

Flamefire commented 9 months ago

Thanks for testing! I've noticed that I send you the code position for the develop branch, not 1.82. I assume you tested 1.82? I'll use that below but should be trivial to adapt if not.

Let me think loud:

on first pass: in_left=1 before, in_left=0 after, res=0, output_count=1

That looks wrong. If "实" was in UTF-8 it would be 3 bytes, not one. So I assume you saved the file in some locale specific encoding where that character can be represented in 1 Byte and hence is not UTF-8 and hence calling from_utf will not produce the expected result in any case. However it also shouldn't hang, so let's continue:

on second pass: in_left=0 before, in_left=0 after, res=0, output_count=0

in_left=0 should set state = unshifting and as after the next call res != -1 we'll set state = done which will exit the loop

on third pass: in_left=3 before, in_left=3 after, res=-1, output_count=0

First: How could a third pass happen if state=done? And why do we have in_left=3 now as we should have at the start? Maybe your "first pass" is on something else and the "third pass" is actually the first pass on your string? Can you verify this?

However the res=-1 should be checked here and depending on err should continue the loop if EILSEQ or EINVAL after incrementing begin, just continue for E2BIG or exit the loop for an unknown errno.

So what exactly did you mean by "causes a freeze" in the initial report or that last part?:

and we can't read more after that position

daniboybye commented 9 months ago

My mistake, you're right. First pass is in_left=3 before, in_left=3 after, res=-1, output_count=0. Err is 7(E2BIG) and state is normal.

Flamefire commented 9 months ago

Ah that explains the infinite loop: It continues the loop without consuming or producing anything so it will do the exact same thing over and over again. I can add a check for that (so it returns an empty string instead) but the system iconv implementation still looks broken: E2BIG should be raised if "There is not sufficient room at *outbuf."

But we pass a valid pointer as outbuf and a size of 64 as out_left. How would that be not enough room to consume even a single byte? Can you verify that at https://github.com/boostorg/locale/blob/boost-1.82.0/src/boost/locale/util/iconv.hpp#L77 out points to result and outsize to "64"?

Maybe also compare the iconv manpage on your system (man "iconv(3)" on a shell) to the above linked to see if that has any differences which might provide hints.

If not I can only document that "IConv on macOS 14+ is broken and should be disabled" which isn't a great solution especially as we cannot detect this easily at buildtime as the interfaces are all there.

CC @artyom-beilis if he has any ideas left.

daniboybye commented 9 months ago

Adding check for E2BIG is not solution because we can have more symbols after Chinese one.

artyom-beilis commented 9 months ago

I suggest lets make trivial C example that reporduces the problem.

In my systems: on first pass: in_left=1 before, in_left=0 after, res=0, output_count=1 on second pass: in_left=0 before, in_left=0 after, res=0, output_count=0 on third pass: in_left=3 before, in_left=3 after, res=-1, output_count=0 and we can't read more after that position

What I don't understand why after in_left = 0 and res = 0 the loop is not breaking: https://github.com/boostorg/locale/blob/c5314a857c5af029ced242820ef62deeec065b1d/src/boost/locale/encoding/iconv_converter.hpp#L78C17-L79C27

On first call it consumes 3 bytes and outputs one char. Next returns res = 0 - which basically means we need to exit since we are in unshifting state.

I really don't understand how we get there

Flamefire commented 9 months ago

Adding check for E2BIG is not solution because we can have more symbols after Chinese one.

I meant to add a check here if any progress was made, i.e. in_left changed or output_count > 0 and only then continue. Otherwise either abort with an error that the implementation is broken or continue with the next input character like here

What I don't understand why after in_left = 0 and res = 0 the loop is not breaking: https://github.com/boostorg/locale/blob/c5314a857c5af029ced242820ef62deeec065b1d/src/boost/locale/encoding/iconv_converter.hpp#L78C17-L79C27

On first call it consumes 3 bytes and outputs one char. Next returns res = 0 - which basically means we need to exit since we are in unshifting state.

@artyom-beilis See https://github.com/boostorg/locale/issues/206#issuecomment-1822646950 The first 2 were something different.

But yes a simple C code demonstrating the issue in Iconv and throwing at Apple support might be an idea. I wrote a simplified code: https://godbolt.org/z/oY6Th5Gh5 @daniboybye Can you try to compile and run that code on your system and see if that fails? If it does (which it should) please submit it to Apple support if you can.

daniboybye commented 9 months ago

I confirm that this example demonstrates the problem and will add it to my report to Apple.

Flamefire commented 9 months ago

Thank you. Please let us know if you get any new information.

I confirm that this example demonstrates the problem and will add it to my report to Apple.

Can you post the output or how you are sure it is exactly this problem? I.e. no input consumed (in_left == 3 && out_left == 64), res=-1 with errno=E2BIG?

It occured to me that on MacOS it might be using the FreeBSD version. Could you run https://godbolt.org/z/98aah1n51 (and optionally https://godbolt.org/z/14eaPefsW for the related issue) on your system and post the output please? It also contains more printed

daniboybye commented 9 months ago

My output for https://godbolt.org/z/98aah1n51

E2BIG=7 EILSEQ=92 EOPNOTSUPP=102 EINVAL=22

Original: \E5\AE\9E in_left: 3 res: 4294967295 errno: 7 in_left: 3 out_left: 64

line:33 Test FAILED: res == 0u line:34 Test FAILED: errno == 0 line:35 Test FAILED: in_left == 0u line:36 Test FAILED: out_left == 64u - 7u res: 0 errno: 7 in_left: 3 out_left: 64

line:45 Test FAILED: errno == 0 line:46 Test FAILED: in_left == 0u line:47 Test FAILED: out_left == 64u - 8u


https://godbolt.org/z/14eaPefsW

E2BIG=7 EILSEQ=92 EOPNOTSUPP=102 EINVAL=22

Original: \E2\80\A6\E2\80\A6 in_left: 6 res: 0 errno: 0 in_left: 0 out_left: 62 \3F\3F line:36 Test FAILED: out_left == 64u - 4u res: 0 errno: 0 in_left: 0 out_left: 62 \3F\3F line:47 Test FAILED: out_left == 64u - 4u

U+0085: \C2\85\C2\85 in_left: 4 res: 4294967295 errno: 92 in_left: 4 out_left: 64

line:33 Test FAILED: res == 0u line:34 Test FAILED: errno == 0 line:35 Test FAILED: in_left == 0u line:36 Test FAILED: out_left == 64u - 4u

res: 0 errno: 92 in_left: 4 out_left: 64

line:45 Test FAILED: errno == 0 /line:46 Test FAILED: in_left == 0u line:47 Test FAILED: out_left == 64u - 4u

U+2026: \E2\80\A6\E2\80\A6 in_left: 6 res: 0 errno: 0 in_left: 0 out_left: 62 \3F\3F line:36 Test FAILED: out_left == 64u - 4u res: 0 errno: 0 in_left: 0 out_left: 62 \3F\3F line:47 Test FAILED: out_left == 64u - 4u

Flamefire commented 7 months ago

It looks like this is indeed an issue with macOS 14/iOS 17 and fixed in 14.2/17.2 respectively as per https://github.com/d99kris/nmail/issues/150#issuecomment-1879929847

218 will close this issue by throwing and exception instead of freezing when the issue is detected