dajva / rg.el

Emacs search tool based on ripgrep
https://rgel.readthedocs.io
GNU General Public License v3.0
471 stars 38 forks source link

Cannot search for Non-Ascii characters (ex. å) #101

Closed Whil- closed 3 years ago

Whil- commented 3 years ago

Hi!

Not entirely sure if the error is in rg.el or not but thought to report it anyhow to see if it's reproducible or not.

On a PC, Windows 10, searching with rg.el for a string including a non-ascii character doesn't give any results back. The exact same search in terminal works fine.

Example search query:

C:\Some\Path> "rg.exe" --color=always --colors=match:fg:red --colors=path:fg:magenta --colors=line:fg:green --colors=column:none -n --column --type-add="gyp:*.gyp" --type-add="gyp:*.gypi" --heading --no-config --fixed-strings --type=all -e "String with å in it" .
dajva commented 3 years ago

Å, ä and ö works fine on linux although the characters get escaped. Did you try the same with emacs grep? On linux I get the same behavior there. Escaped chars and search hits. I don't officially support windows since I can't test it. I am happy to accept patches if you find bugs though.

Whil- commented 3 years ago

Hmm, would like to do some debugging but don't know where to start. Any tips to get started? Where is this escaping taking place for example?

dajva commented 3 years ago

The escaping of the search string is triggered by grep-expand-template that is using shell-quote-argument to escape the search string. FWIW, in bash it works without the escaping. I don't know how it works on windows. At least the shell-quote-argument has separate code paths for different platforms. So I would start with looking into if any escaping is needed on windows and if that is correctly done from within this package. Since this is reusing code from builtin grep.el you can also try with grep search to see what happens.

Whil- commented 3 years ago

Investigated a bit more and got it to work with default-process-coding-system set to '(latin-1-dos . latin-1-unix).

I tested with a global config:

(setq default-process-coding-system '(latin-1-dos . latin-1-unix))

That made it work for me personally. Not sure if that's the path to take for this package when running on windows. Maybe a slippery slope...!?

svraka commented 1 year ago

I know, it's an old thread but someone might still find useful. The suggestion above was not sufficient for me (I'm on Windows 10 with a Hungarian locale). Although rg can search for non-ASCII characters after setting default-process-coding-system, output will be garbled. This however works:

(prefer-coding-system 'utf-8)
(setq default-process-coding-system '(undecided-dos . windows-1250))

Replace windows-1250 according to your system's locale. undecided-dos is the default on Windows for both reading and writing.