dajva / rg.el

Emacs search tool based on ripgrep
https://rgel.readthedocs.io
GNU General Public License v3.0
480 stars 41 forks source link

IPv6 addresses often mistaken as line/column numbers #146

Closed mbunkus closed 1 year ago

mbunkus commented 1 year ago

When I search for something & one of the result lines contains an IPv6 address, rg.el mistakes parts of them as the line & column number patters. For example, if I have the following whole line visible in *rg*:

terraform/lkbs/infrastructure.tf:226:    src_cidr = "2001:1640:141::/48"

and I press enter on it, Emacs asks me for the path to the file to visit with a nonsensical default:

Find this error in (default terraform/lkbs/infrastructure.tf:226:    src_cidr = "2001): …

The problem seems to be that the regular expressions used for building compilation-error-regexp-alist is too greedy.

I'm well aware that there's no 100% correct way of splitting rg's output (or maybe there is using the -0 arg…). Personally I'd very much like to be able to make the file name part non-greedy as I 100% never have file names with colons in them followed directly by numbers.

dajva commented 1 year ago

Thanks for the report. Yes I think the -0 would probably work fine here. I guess we could use that for the --no-heading setup that you are using.

dajva commented 1 year ago

After some testing I think -0 would be problematic mainly since it would be cumbersome to get rid of those ugly null chars in the output. TBH, I have not really crafted these regexps myself and never put much thought into it. I think it actually makes more sense to be restrictive with the kind of stuff that is allowed in file names, iow doing something like [^:]+ for matching the filename part. This seems to be what grep.el is doing anyway so should be fine to be in line with that.

mbunkus commented 1 year ago

Thank you for looking into it. I really appreciate it.

Yeah I figured having \0 in the output being bad, or at least requiring somewhat of a big-ish rewrite.

Like I said above I'd be perfectly fine with not allowing : as part of the file name, whatever regex you do it with. [^:]+ looks absolutely reasonable. I do know that : are allowed as part of file names on non-Windows systems, but then again I don't think this matters much wrt. to rg. We (as in: the collective users of rg) usually use it with code, with system configuration, maybe text archives. In each of those cases I've never seen : to be part of the file name. They're usually only used with system files, e.g. stuff in /tmp maybe.

The one exception that comes to mind is maildir directories.

Soooo… maybe making it configurable would be best? Or even toggle-able from the *rg* buffers?

dajva commented 1 year ago

Pushed a fix now without configuration possibilities. Let's add that if people request it.