chalk / ansi-regex

Regular expression for matching ANSI escape codes
MIT License
184 stars 79 forks source link

Can't match this string "\u001b[?1049h\u001b[?1h\u001b=" #6

Closed wahaha88 closed 7 years ago

wahaha88 commented 9 years ago

"man" command in Linux will produce many escape codes. All of escape codes but this first string "\u001b[?1049h\u001b[?1h\u001b=" can be matched by ansi-regex.

I use strip-ansi to trim "\u001b[?1049h\u001b[?1h\u001b=" ,and get this result "[?1049h[?1h=", not "".

Qix- commented 9 years ago

\u001b[?1049h is not a common ANSI code. Neither is \u001b=. Where are they coming from?

In fact, according to ANSI CSI specs \u001b= isn't even supposed to be considered a CSI escape. It'd either be \u001b[ or \u009b.

I was wrong :) \u001b[?1049h is the CSR/CUP escape code.

wahaha88 commented 9 years ago

In linux, use "man traceroute" command. I get the output as following. The string is in the first line. Other ANSI code can be ansi-regexed. \u001b[?1049h\u001b[?1h\u001b= \r\nTRACEROUTE(8) BSD System Manager's Manual TRACEROUTE(8)\r\n\r\n\u001b[1mNAME\u001b[0m\r\n \u001b[1mtraceroute\u001b[0m -- print the route packets take to network host\r\n\r\n\u001b[1mSYNOPSIS\u001b[0m\r\n \u001b[1mtraceroute\u001b[0m [\u001b[1m-adeFISdNnrvx\u001b[0m] [\u001b[1m-A\u001b[0m \u001b[4mas_server\u001b[24m] [\u001b[1m-f\u001b[0m \u001b[4mfirst_ttl\u001b[24m] [\u001b[1m-g\u001b[0m \u001b[4mgateway\u001b[24m]\r\n [\u001b[1m-i\u001b[0m \u001b[4miface\u001b[24m] [\u001b[1m-M\u001b[0m \u001b[4mfirst_ttl\u001b[24m] [\u001b[1m-m\u001b[0m \u001b[4mmax_ttl\u001b[24m] [\u001b[1m-P\u001b[0m \u001b[4mproto\u001b[24m] [\u001b[1m-p\u001b[0m \u001b[4mport\u001b[24m]\r\n [\u001b[1m-q\u001b[0m \u001b[4mnqueries\u001b[24m] [\u001b[1m-s\u001b[0m \u001b[4msrc_addr\u001b[24m] [\u001b[1m-t\u001b[0m \u001b[4mtos\u001b[24m] [\u001b[1m-w\u001b[0m \u001b[4mwaittime\u001b[24m]\r\n [\u001b[1m-z\u001b[0m \u001b[4mpausemsecs\u001b[24m] \u001b[4mhost\u001b[24m [\u001b[4mpacketsize\u001b[24m]\r\n\r\n\u001b[1mDESCRIPTION\u001b[0m\r\n The Internet is a large and complex aggregation of network hardware, con-\r\n nected together by gateways. Tracking the route one's packets follow (or\r\n finding the miscreant gateway that's discarding your packets) can be dif-\r\n ficult. \u001b[1mtraceroute\u001b[0m utilizes ...

Qix- commented 9 years ago

Can't reproduce; the following all show me expected results

$ man traceroute
$ man traceroute > /tmp/mtr && vim /tmp/mtr
$ man traceroute | cat
$ man traceroute | strip-ansi
$ man traceroute | strip-ansi > /tmp/mtr && vim /tmp/mtr

What system are you on? What version of man do you have? Do you have troff or groff installed? What versions are they?

Qix- commented 9 years ago

Aha, I think I know what the deal is. \u001b[?1049h seems to be a terminal clear buffer code, though I'm not sure which one. Probably something specific to your termcap entry - also probably why it's one of the first in your man implementation.

Very interesting, never seen four-digit codes, thanks for the issue. I'll submit some code for it here in a bit.

wahaha88 commented 9 years ago

Sorry, I make a mistake in OS system. I use OS X10.10.5. $man -v man, version 1.6c

I'm write a program for getting OSX/linux command output by c language, and putting output to JSON. This JSON string will be parsed by javascript. After I tested "man" command, I found this issue. This issue can be found in every "man traceroute" command. I think the first output line "man traceroute" have invisible characters.

wahaha88 commented 9 years ago

$ groff -v GNU groff version 1.19.2 Copyright (C) 2004 Free Software Foundation, Inc. GNU groff comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of groff and its subprograms under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING.

called subprograms:

GNU grops (groff) version 1.19.2 GNU troff (groff) version 1.19.2

$ troff -v GNU troff (groff) version 1.19.2a

wahaha88 commented 9 years ago

I have tested "man ping". "\u001b[?1049h\u001b[?1h\u001b=" is still existed.

Qix- commented 9 years ago

Interesting, they appear to be standard. Well definitely add this in here in a bit :)

:+1: thanks for the report :D

sindresorhus commented 9 years ago

@Qix- Might want to check if it's an issue in Node.js core too and open an issue about it there if so.

Qix- commented 9 years ago

@sindresorhus would it be at all beneficial to beef up our own regex and then somehow PR it back into theirs? The one over at node is, well... a mess.

sindresorhus commented 9 years ago

@Qix- Sounds like a good idea, but seeing how little tests they have for it in Node.js core I think they're going to be resistant to major changes. Still worth a try though.

Qix- commented 9 years ago

Well tests are easy to write, and we have the mother of all ANSI tests in this (though I found another resource that has quite a few more codes to it).

whitelynx commented 7 years ago

Just an FYI: The \u001b[?1049h is a CSI sequence supported by (at least) xterm, rxvt-unicode, st, and even the Windows console when ANSI escape codes are enabled. This sequence activates the alternate screen buffer. On those and similar terminals (or when faking those terminals), getting the smcup capability from terminfo will give you that sequence. (\u001b[?1049l is the corresponding code for deactivating the alt screen buffer) There are several other terminals which also support the alternate screen buffer, but which use different codes to activate/deactivate it.

They are a specific instance of the ANSI-standard "Set Mode" and "Reset Mode" sequences detailed in the VT102 User Guide. As mentioned there, the ? is used to denote "ANSI private parameters".


The reason @wahaha88 is seeing these and @Qix- is not is probably because of the terminal being used (or, more directly, the value of the TERM variable). In the future, it might be worth having a look at the ncurses terminfo database for other possible escape codes that may have been missed.

Qix- commented 7 years ago

In the future, it might be worth having a look at the ncurses terminfo database for other possible escape codes that may have been missed.

Great idea; I had that idea today too and figured it wasn't the first time someone proposed that.

You're absolutely right. I'll add this to my list to get to in the near future. Thank you both for the research involved!

Qix- commented 7 years ago

Closing this I believe this is fixed. Let me know if that isn't the case.

Thanks for everyone's input!