Closed wahaha88 closed 7 years ago
\u001b[?1049h
is not a common ANSI code. Neither is \u001b=
. Where are they coming from?
In fact, according to ANSI CSI specs \u001b=
isn't even supposed to be considered a CSI escape. It'd either be \u001b[
or \u009b
.
I was wrong :) \u001b[?1049h
is the CSR/CUP escape code.
In linux, use "man traceroute" command. I get the output as following. The string is in the first line. Other ANSI code can be ansi-regexed. \u001b[?1049h\u001b[?1h\u001b= \r\nTRACEROUTE(8) BSD System Manager's Manual TRACEROUTE(8)\r\n\r\n\u001b[1mNAME\u001b[0m\r\n \u001b[1mtraceroute\u001b[0m -- print the route packets take to network host\r\n\r\n\u001b[1mSYNOPSIS\u001b[0m\r\n \u001b[1mtraceroute\u001b[0m [\u001b[1m-adeFISdNnrvx\u001b[0m] [\u001b[1m-A\u001b[0m \u001b[4mas_server\u001b[24m] [\u001b[1m-f\u001b[0m \u001b[4mfirst_ttl\u001b[24m] [\u001b[1m-g\u001b[0m \u001b[4mgateway\u001b[24m]\r\n [\u001b[1m-i\u001b[0m \u001b[4miface\u001b[24m] [\u001b[1m-M\u001b[0m \u001b[4mfirst_ttl\u001b[24m] [\u001b[1m-m\u001b[0m \u001b[4mmax_ttl\u001b[24m] [\u001b[1m-P\u001b[0m \u001b[4mproto\u001b[24m] [\u001b[1m-p\u001b[0m \u001b[4mport\u001b[24m]\r\n [\u001b[1m-q\u001b[0m \u001b[4mnqueries\u001b[24m] [\u001b[1m-s\u001b[0m \u001b[4msrc_addr\u001b[24m] [\u001b[1m-t\u001b[0m \u001b[4mtos\u001b[24m] [\u001b[1m-w\u001b[0m \u001b[4mwaittime\u001b[24m]\r\n [\u001b[1m-z\u001b[0m \u001b[4mpausemsecs\u001b[24m] \u001b[4mhost\u001b[24m [\u001b[4mpacketsize\u001b[24m]\r\n\r\n\u001b[1mDESCRIPTION\u001b[0m\r\n The Internet is a large and complex aggregation of network hardware, con-\r\n nected together by gateways. Tracking the route one's packets follow (or\r\n finding the miscreant gateway that's discarding your packets) can be dif-\r\n ficult. \u001b[1mtraceroute\u001b[0m utilizes ...
Can't reproduce; the following all show me expected results
$ man traceroute
$ man traceroute > /tmp/mtr && vim /tmp/mtr
$ man traceroute | cat
$ man traceroute | strip-ansi
$ man traceroute | strip-ansi > /tmp/mtr && vim /tmp/mtr
What system are you on? What version of man
do you have? Do you have troff
or groff
installed? What versions are they?
Aha, I think I know what the deal is. \u001b[?1049h
seems to be a terminal clear buffer code, though I'm not sure which one. Probably something specific to your termcap entry - also probably why it's one of the first in your man implementation.
Very interesting, never seen four-digit codes, thanks for the issue. I'll submit some code for it here in a bit.
Sorry, I make a mistake in OS system. I use OS X10.10.5. $man -v man, version 1.6c
I'm write a program for getting OSX/linux command output by c language, and putting output to JSON. This JSON string will be parsed by javascript. After I tested "man" command, I found this issue. This issue can be found in every "man traceroute" command. I think the first output line "man traceroute" have invisible characters.
$ groff -v GNU groff version 1.19.2 Copyright (C) 2004 Free Software Foundation, Inc. GNU groff comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of groff and its subprograms under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING.
called subprograms:
GNU grops (groff) version 1.19.2 GNU troff (groff) version 1.19.2
$ troff -v GNU troff (groff) version 1.19.2a
I have tested "man ping". "\u001b[?1049h\u001b[?1h\u001b=" is still existed.
Interesting, they appear to be standard. Well definitely add this in here in a bit :)
:+1: thanks for the report :D
@Qix- Might want to check if it's an issue in Node.js core too and open an issue about it there if so.
@sindresorhus would it be at all beneficial to beef up our own regex and then somehow PR it back into theirs? The one over at node is, well... a mess.
@Qix- Sounds like a good idea, but seeing how little tests they have for it in Node.js core I think they're going to be resistant to major changes. Still worth a try though.
Well tests are easy to write, and we have the mother of all ANSI tests in this (though I found another resource that has quite a few more codes to it).
Just an FYI: The \u001b[?1049h
is a CSI sequence supported by (at least) xterm
, rxvt-unicode
, st
, and even the Windows console when ANSI escape codes are enabled. This sequence activates the alternate screen buffer. On those and similar terminals (or when faking those terminals), getting the smcup
capability from terminfo
will give you that sequence. (\u001b[?1049l
is the corresponding code for deactivating the alt screen buffer) There are several other terminals which also support the alternate screen buffer, but which use different codes to activate/deactivate it.
They are a specific instance of the ANSI-standard "Set Mode" and "Reset Mode" sequences detailed in the VT102 User Guide. As mentioned there, the ?
is used to denote "ANSI private parameters".
The reason @wahaha88 is seeing these and @Qix- is not is probably because of the terminal being used (or, more directly, the value of the TERM
variable). In the future, it might be worth having a look at the ncurses terminfo
database for other possible escape codes that may have been missed.
In the future, it might be worth having a look at the ncurses terminfo database for other possible escape codes that may have been missed.
Great idea; I had that idea today too and figured it wasn't the first time someone proposed that.
You're absolutely right. I'll add this to my list to get to in the near future. Thank you both for the research involved!
Closing this I believe this is fixed. Let me know if that isn't the case.
Thanks for everyone's input!
"man" command in Linux will produce many escape codes. All of escape codes but this first string "\u001b[?1049h\u001b[?1h\u001b=" can be matched by ansi-regex.
I use strip-ansi to trim "\u001b[?1049h\u001b[?1h\u001b=" ,and get this result "[?1049h[?1h=", not "".