adoxa / ansicon

Process ANSI escape sequences for Windows console programs.
http://ansicon.adoxa.vze.com/
Other
1.23k stars 130 forks source link

Pseudographic symbols broken when echoed from PHP #91

Closed Rarst closed 7 years ago

Rarst commented 8 years ago

Having issue with some symbols garbled when output by PHP into console.

Works:

C:\server\composer>echo └├──│
└├──│

But:

<?php

echo '└├──│';

Doesn't:

C:\server\composer>php tree.php
тФФтФЬтФАтФАтФВ

See https://github.com/composer/composer/issues/4803

adoxa commented 8 years ago

Looks like UTF-8, so chcp 65001 should work, but it's not ideal.

Rarst commented 8 years ago

Lost me after UTF-8. :) Would this be something to resolve on ansicon side or something to configure in the system?

rquadling commented 8 years ago

Save this as UnicodeCHCP.reg and then double click it.

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\SOFTWARE\Microsoft\Command Processor]
"Autorun"="@CHCP 65001>NUL"

This will allow your command prompt to display those characters. Maybe. It will depend upon the font you have chosen to use for your command prompt.

Rarst commented 8 years ago

Nope, that messed up tree output differently (and broke cyrillics while at it).

rquadling commented 8 years ago

More than likely you are using a font that isn't Unicode for your command prompt. There aren't many monospaced ones.

The other option is to convert the output in PHP from UTF-8 to whatever code page you want to use.

Rarst commented 8 years ago

I use Consolas font, same as in IDE. Hadn't had any issues, being shipped with Windows by MS it's probably as compatible as it gets.

rquadling commented 8 years ago

Give Lucida Console a try. I doubt either of them will cover all the Unicode code points, so you are going to have issues either way you go.

adoxa commented 8 years ago

UTF-8 is a multibyte character encoding used to store the multitude of characters available in Unicode; 65001 is its Windows code page number.

T:\>chcp 437
Active code page: 437

T:\>type tree.txt
ΓööΓö£ΓöÇΓöÇΓöé

T:\>chcp 866
Active code page: 866

T:\>type tree.txt
тФФтФЬтФАтФАтФВ

T:\>chcp 65001
Active code page: 65001

T:\>type tree.txt
└├──│

The situation seems to be that your console is using CP866, PHP is outputting UTF-8 and ANSICON is converting what it expects to be CP866 to UTF-16 (that's what Windows uses). UTF-8 has a particular format, so I can reliably determine if it's not UTF-8, but distinguishing between CP866 (or whatever) and UTF-8 may be prone to error (especially with the probably small sample being output). I could add something like set ANSICON_CP=php:65001 to indicate a program's output is a specific code page, not the current.

rquadling commented 8 years ago

Jason, I would STRONGLY recommend against doing any sort of code page conversion automatically. It is simply NOT ANSICON's job. For systems/users with a mismatched output and terminal, they have to sort that out. PHP is doing its job correctly, ANSICON is doing its job correctly, terminal is doing its job as assigned.

Rarst commented 8 years ago

Give Lucida Console a try.

No luck.

The situation seems to be that your console is using CP866

Ok, seems so, but I have trouble replicating your 65001 example. Current says 866 and demonstrates the issue.

Microsoft Windows [Version 6.1.7601]
(c) Корпорация Майкрософт (Microsoft Corp.), 2009. Все права защищены.

C:\server>chcp
Текущая кодовая страница: 866

C:\server>php tree.php
тФФтФЬтФАтФАтФВ

C:\server>

But 65001 outputs nothing at all from PHP and additionally breaks cyrillics:

Microsoft Windows [Version 6.1.7601]
(c) Љ®аЇ®а жЁп Њ ©Єа®б®дв (Microsoft Corp.), 2009. ‚ᥠЇа ў  § йЁйҐ­л.

C:\server>chcp 65001
Active code page: 65001

C:\server>php tree.php

C:\server>
rquadling commented 8 years ago

Iif you don't load ANSICON what output do you get?

Rarst commented 8 years ago

Ok, with uninstalled ansicon I still get the tree symbols borked, but cyrillics no longer break with 65001 thing (which still doesn't fix the tree stuff).

I assumed ansicon is the issue since in Composer issue it was posted that it works without it fine, but I might have been wrong, sorry. :(

Encodings! :(

rquadling commented 8 years ago

Encodings suck.

adoxa commented 8 years ago

Which version of PHP are you using?

Rarst commented 8 years ago
C:\server>php --version
PHP 5.6.0 (cli) (built: Aug 27 2014 11:54:39)

Though I suppose this can be closed as unsalvageable mess, since also issues without ansicon. :)

rquadling commented 8 years ago

Oh yeah!!!

And it seems Windows 10 console now supports ANSI!!!

Got a new prompt with no ansicon loaded and my prompt is in colour!!!

Running composer --ansi and lo and behold ... in colour!!!

Need to reboot to check for sure. I had renamed the ANSICON directory and opened a new prompt and got the error about the "missing" ansicon.exe, which is as expected, and then composer (normally detects ANSICON and shows colour) now shows black/white. Add the --ansi option and colour!

rquadling commented 8 years ago

None of which makes the problem of Unicode output buffering go away.

adoxa commented 8 years ago

I found a post on the JPSoft (TCC) forums that suggest the November Windows 10 update added ANSI support, but couldn't find anything else. Unfortunately, I'm not in a position to get 10 to see for myself.

adoxa commented 8 years ago

Okay, I downloaded 5.6.17 (php-5.6.17-nts-Win32-VC11-x86.zip to be precise) and here's what I get.

C:\Language\php>chcp 866
Active code page: 866

C:\Language\php>type tree.php
<?php

echo "тФФтФЬтФАтФАтФВ\n";
?>

C:\Language\php>php --version
PHP 5.6.17 (cli) (built: Jan  6 2016 13:28:25)
Copyright (c) 1997-2015 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2015 Zend Technologies

C:\Language\php>php tree.php
тФФтФЬтФАтФАтФВ

C:\Language\php>ansicon --version
ANSICON (64-bit) version 1.66 (21 September, 2013).

C:\Language\php>ansicon php tree.php
тФФтФЬтФАтФАтФВ

C:\Language\php>\Projects\ansicon\x86\ansicon --version
ANSICON (32-bit) version 1.72 (24 December, 2015).

C:\Language\php>\Projects\ansicon\x86\ansicon php tree.php
тФФтФЬтФАтФАтФВ

C:\Language\php>chcp 65001
Active code page: 65001

C:\Language\php>type tree.php
<?php

echo "└├──│\n";
?>

C:\Language\php>php tree.php
���├──│
�─│
���

C:\Language\php>ansicon php tree.php
���├──│
�─│
���

C:\Language\php>\Projects\ansicon\x86\ansicon -l5 php tree.php
└├──│

C:\Language\php>set ANSICON_API=php

C:\Language\php>ansicon php tree.php
���├──│

C:\Language\php>chcp 866
Active code page: 866

C:\Language\php>type %temp%\ansicon.log
ANSICON (32-bit) v1.72 log (5) started 2016-01-23 11:06:34

ansicon (536): php (2556)
ansicon (536):   32-bit console (base = 010E0000)

php (2556): hDllInstance = 00AC0000
php (2556): WriteFile: 1 "т"
php (2556):   lead byte, removing
php (2556): WriteFile: 15 "ФФтФЬтФАтФАтФВ\n"
php (2556):   starts with 2 trail bytes, removing & writing "тФФ"
php (2556): Terminating

So, I'm getting undefined characters, rather than nothing at all; that could possibly be the difference in PHP versions. Setting ANSICON_API still doesn't help with the first character, but does the rest; however, since you had no output at all, I'm not sure it will work for you. The latest ANSICON recognises the initial split character and has no need of ANSICON_API (I always do it now).

Rarst commented 7 years ago

This doesn’t sound like anything obviously actionable and getting old, so closing. Moved on to PHP 7.1 by now, which seems to swallow characters altogether or whatever. Just staying away from trees for now. :)

adoxa commented 7 years ago

Installed 7.1.5 and it seems to work fine with Win7.

C:\Language\php7>php --version
PHP 7.1.5 (cli) (built: May  9 2017 19:52:14) ( NTS MSVC14 (Visual C++ 2015) x86 )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.1.0, Copyright (c) 1998-2017 Zend Technologies

C:\Language\php7>ansicon --version
ANSICON (64-bit) version 1.66 (21 September, 2013).

C:\Language\php7>chcp
Active code page: 437

C:\Language\php7>type tree.php
<?php

echo "\e[4CΓööΓö£ΓöÇΓöÇΓöé\n";
?>

C:\Language\php7>php tree.php
 [4C└├──│

C:\Language\php7>ansicon php tree.php
    └├──│