burner / bugzilla_migration_test

0 stars 0 forks source link

UTF-8 output to console is seriously broken #18

Open burner opened 17 years ago

burner commented 17 years ago

a.solovey reported this on 2007-08-28T22:51:06Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=1448

CC List

Description

If windows console code page is set to 65001 (UTF-8) and program outputs non-ascii characters in UTF-8 encoding, there will be no more output after the first new line after accented character. I believe that problem is in underlying DMC stdio, but it is more disturbing with D as it has good Unicode support and it is very convenient to work international texts in it. This problem has been reported in newsgroup several times before, see for example http://www.digitalmars.com/d/archives/digitalmars/D/announce/openquran_v0.21_8492.html Here is the code to illustrate the problem: //////// import std.c.stdio; import std.c.windows.windows;

extern(Windows) export BOOL SetConsoleOutputCP( UINT );

void main() { SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead // Codepoint 00e9 is "Latin small letter e with acute" puts( "Output utf-8 accented char \u00e9 ... and the rest is cut off! " ); } ///////// If you run it, "... and the rest is cut off!" won't be displayed. Do not forget to set console font to Lucida Console before trying this.

!!!There are attachements in the bugzilla issue that have not been copied over!!!

burner commented 17 years ago

a.solovey commented on 2007-08-28T22:52:24Z

Created attachment 172 Small test cae for the same problem in DMC

burner commented 17 years ago

smjg commented on 2007-08-29T13:03:13Z

The problem doesn't show if I use the Windows API (either WriteConsole or WriteFile) to output. So the bug must be somewhere in DM's stdio implementation.

burner commented 17 years ago

bugzilla (@WalterBright) commented on 2007-09-28T22:15:07Z

Fixed dmd 1.021 and 2.004

burner commented 17 years ago

mk commented on 2007-10-29T11:02:51Z

The problem was NOT fixed for stderr (DMD 1.022)

burner commented 17 years ago

mk commented on 2007-10-29T11:04:25Z

Bug 1608 has been marked as a duplicate of this bug.

burner commented 16 years ago

mk commented on 2008-09-03T10:57:24Z

I hope this gets fixed one day. Here is an updated example, where it still doesn't work (for stderr, stdout is ok) as of DMD 1.035

import std.c.stdio; import std.c.windows.windows;

extern(Windows) export BOOL SetConsoleOutputCP( UINT );

void main() { SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead // Codepoint 00e9 is "Latin small letter e with acute" fputs("Output utf-8 accented char \u00e9\n... and the rest is OK\n", stdout); fputs("Output utf-8 accented char \u00e9\n... and the rest is cut off!\n", stderr); fputs("STDOUT.\n", stdout); fputs("STDERR.\n", stderr); }

burner commented 12 years ago

kevin commented on 2012-02-07T22:48:48Z

Sort of works for me.

The text doesn't get cut off, but the unicode characters don't get displayed either.

C:\Users\Kevin\Documents\D Projects\ConsoleApp1\ConsoleApp1\bin>ConsoleApp1.exe Output utf-8 accented char é ... and the rest is OK Output utf-8 accented char �� ... and the rest is cut off! STDOUT. STDERR.

C:\Users\Kevin\Documents\D Projects\ConsoleApp1\ConsoleApp1\bin>

burner commented 11 years ago

mk commented on 2013-03-19T18:21:18Z

Status update as of DMD 2.062 (Win XP 32 bit)

Still the same error for the above mentioned example, however, when modified to use write instead of fputs:

import std.stdio; import std.c.windows.windows;

extern(Windows) BOOL SetConsoleOutputCP( UINT );

void main() { SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead stderr.write("STDERR:Output utf-8 accented char \u00e9\n... and the rest is cut off!\n"); stderr.write("end_STDERR.\n"); }

I get this error:

STDERR:Output utf-8 accented char é ... and the rest is cut off! std.exception.ErrnoException@D:\PROGRAMS\DMD2\WINDOWS\BIN....\src\phobos\std\stdio.d(1264): (No error)

0x0040D874 0x0040D6FF 0x00402218 0x00402189 0x00402121 0x00402030 0x0040354E 0x00403151 0x00402388 0x7C81776F in RegisterWaitForInputIdle

So if anybody have a clue what's going on there...

burner commented 11 years ago

ben commented on 2013-08-07T00:55:43Z

I can confirm this issue. When enumerating a directory (via dirEntries()) containing a file with a character in the CP850/CP1252 space (e.g. "säb"), depending on the codepage settings, the output is as follows:

chcp 1252 => output is "säb" (Unicode encoding for "ä") chcp 65001 => output is "säbstd.exception.ErrnoException@D:\tools\d\bin..\src\phobos\std\stdio.d(1352): (No error)"

In both cases e.g. cmd's dir shows the correct results. The correct results are also shown when using - not really comparable - C with printf().

Tried the case in cmd, console2, and conemu. All show the same results.

It'd really be nice if this bug would get fixed...

burner commented 11 years ago

ben commented on 2013-08-07T00:58:06Z

Addendum: Windows 7 64-bit, dmd v2.063.2.

Sorry.

burner commented 10 years ago

mk commented on 2014-02-24T17:18:25Z

Hallelujah, this (comment 8) seems fixed, finally. Can anybody confirm ? Works for me on Windows XP 32 bit, dmd 2.065.0

Beware, fputs still doesn't work. I think it's C library problem.

burner commented 10 years ago

sum.proxy commented on 2014-10-25T09:26:49Z

The issue still exists in DMD32 D Compiler v2.065, Windows 7

============== Code:

import std.stdio; import std.c.windows.windows;

extern(Windows) BOOL SetConsoleOutputCP( UINT );

void main() { SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead stderr.write("STDERR:Output utf-8 accented char \u00e9\n... and the rest is cut off!\n"); stderr.write("end_STDERR.\n"); }

Output:

STDERR:Output utf-8 accented char é ... and the rest is cut off!

==============

end_STDERR.\n is not written

burner commented 8 years ago

mk commented on 2016-02-09T21:07:53Z

Final note, as this is unlikely to be fixed: use -m32mscoff and Microsoft VS linker.

burner commented 7 years ago

mk commented on 2016-11-30T11:14:40Z

Partial fix or workaround in druntime for unhandled exceptions: https://github.com/dlang/druntime/pull/1687

burner commented 5 years ago

kinke commented on 2019-06-13T18:33:49Z

Still an issue, but apparently restricted to stderr (and independent from DigitalMars/MS runtime):

import core.stdc.stdio;
import core.sys.windows.wincon, core.sys.windows.winnls;

void main()
{
    const oldCP = SetConsoleOutputCP(CP_UTF8);
    scope(exit) SetConsoleOutputCP(oldCP);

    fprintf(stdout, "HellöѬ LDC\n");
    fflush(stdout);

    fprintf(stderr, "HellöѬ LDC\n");
    fflush(stderr);
}

=>

HellöѬ LDC
Hell

Tested with DMD 2.086.0 (-m32, -m32mscoff, -m64) and LDC on Win10.

burner commented 5 years ago

kinke commented on 2019-06-15T09:47:31Z

Update: it's working with Win10 v1903 (with the exact same binary that didn't work with v1803). According to Rainer Schütze, it's working since v1809. See https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/.

burner commented 5 years ago

razvan.nitu1305 commented on 2019-08-12T12:05:57Z

(In reply to kinke from comment #16)

Update: it's working with Win10 v1903 (with the exact same binary that didn't work with v1803). According to Rainer Schütze, it's working since v1809. See https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and- utf-8-output-text-buffer/.

So is this issue fixed? I don't have a windows machine to test it. Should we close this?

burner commented 5 years ago

kinke commented on 2019-08-12T13:03:28Z

This isn't solved, but would now be solvable with recent Windows versions.

There are 2 things about this:

burner commented 5 years ago

razvan.nitu1305 commented on 2019-10-24T09:32:25Z

(In reply to kinke from comment #18)

This isn't solved, but would now be solvable with recent Windows versions.

There are 2 things about this:

  • DMD outputs a mix of UTF-8 and strings in the current codepage, AFAIK without setting any console codepage, so DMD output on Windows can be garbage. LDC v1.17 fixes this for LDC.

How does LDC solve the problem?

  • User programs writing UTF-8 strings to the console suffer from the same issue. This could be worked around by setting the console codepage in druntime's _d_run_main and resetting it to the original one before termination.