flexibeast / ebuku

Emacs interface to the buku Web bookmark manager.
92 stars 7 forks source link

Error handling encoding related issues on Windows #27

Closed CosmosAtlas closed 1 year ago

CosmosAtlas commented 1 year ago

I tried in Linux and MacOS, only Windows have the following issue.

Error I'm getting from debug log.

ebuku--search-helper: Args out of range: #("212. Tutorial - Write a Shell in C 鈥\242 Stephen Brennan" 35 53 (charset chinese-gbk)), 46810, 46828

The bookmark in question is linked.

The related line of ebuku is line 741

Some details about my encoding related configuration

If I delete the special char (which is "•", the symbol used between work and name), ebuku works fine. If I also set the experimental utf-8 setting on windows, ebuku also works fine. However, this is not ideal as it could break other software.

Just off my head here. I'm wondering if any of the following is possible.

flexibeast commented 1 year ago

Sorry to have taken so long to respond - thank you for the high-quality bug report!

i'm trying to work out how to best handle this. As a starting point, it seems that the GBK encoding can't encode the character. In a shell on my Gentoo system, which uses a UTF-8 encoding for the locale:

$ echo '•' | iconv -f UTF-8 -t GBK
iconv: illegal input sequence at position 0

In Emacs, if i:

  1. Open a new file, gbk.txt.
  2. Ensure the new buffer's coding system is gbk-dos, setting it with M-x set-buffer-file-coding-system.
  3. M-x insert-char BULLET RET
  4. Try saving the file.

that results in a *Warning* buffer:

These default coding systems were tried to encode the following
problematic characters in the buffer ‘gbk.txt’:
  Coding System           Pos  Codepoint  Char
  gbk-dos                   1  #x2022     •

However, each of them encountered characters it couldn’t encode:
  gbk-dos cannot encode these: •

Does this happen in your Emacs as well?

Next, are you able to add that bookmark, with that title, by using buku directly from the Windows terminal / command prompt? E.g.:

buku --add https://brennan.io/2015/01/16/write-a-shell-in-c/

If that succeeds, does the character correctly appear in the output of a search? E.g.:

buku --sany brennan

Either way, can you please copy-and-paste the search output here?

CosmosAtlas commented 1 year ago

Thanks for the detailed follow up! It inspired some new research direction and I have successfully solved the issue on my side, despite from another perspective. (details at the end of the post).

Yes, I experience the same when I try to save a file with BULLET in GBK format.

When I run buku in cmd or pwsh, it works perfectly. Upon inspection via python -c "import sys;print(sys.stdout.encoding)" it seems like within emacs it returns "gbk" (e.g., through shell), but in either shell directly ran on windows it returns "utf-8".

At this point I realized if I can explicitly ask python to use utf-8 within emacs, the issue will be resolved for me. After some searching, I discovered an environment variable PYTHONIOENCODING. By setting this variable in emacs (or globally), I was able to make the aforementioned command return "utf-8" within emacs (also removing the requirements to set the cmdproxy encoding).

Now ebuku runs perfectly for me using UTF-8 encoding in Emacs on windows.

image

IDK why shell acts differently between emacs and the system environment. Not much information available online too. I guess I have a pass this time, but it will probably hunt me in the future for another issue.

Again thanks for the insightful reply!