Open 1925381584 opened 2 months ago
This reminds me of #31, which the user fixed by ensuring that the value of the LC_ALL
environment variable was changed from C
to the appropriate locale (in that case, zh_CN.UTF-8
, as described here).
What's the value of LC_ALL
in Emacs' environment on your system? You can find that information via e.g. M-x list-environment
.
there is no a command call "list-environment" in my emacs. And I could not find the viable LC_ALL too.
My apologies. Can you instead please evaluate:
(seq-filter #'(lambda (v) (numberp (string-match "^LC" v))) process-environment)
and report the output?
the output is that
Okay, so LC_ALL
is set appropriately.
Could you please do M-x toggle-debug-on-error
, and then try to start Ebuku? It should result in a buffer showing what commands/functions got called; could you please share the contents of that buffer?
here is
this is my configure
Thank you - i'll investigate this and get back to you.
In the second line of the backtrace - the one that starts with #<subr string-collate-lessp ...
- there are two bookmark tags being compared in order to sort them correctly. However, it appears that the tags have been saved in the buku database with different encodings; presumably the second is UTF-8, as it's rendering correctly, but the first one is showing the raw bytes (in octal), and i'm not sure what encoding it might be..
Can you please copy-and-paste the two tags into two new and separate files, each tag in their own file, and then open up each of those files in Emacs, calling C-h v buffer-file-coding-system
in each buffer, and sharing the results?
Now I only import one bookmark,but still getting this error.
But you're importing one bookmark into a pre-existing buku database, correct? If so, then there's still the issue of comparing pre-existing tags with the tag(s) of the bookmark being imported. So, please follow the instructions i provided in my previous comment, and share the results.
I’m sorry I don't know how to copy-and-paste the two tags into two new and separate files. when I import the bookmark, I have clean bookmarks. After that I made the changes in the image below and it reads successfully. It looks like there is a problem parsing the Chinese language.
It's clearly not a problem with handling Chinese per se, for two reasons:
It's okay if you don't understand how to do something i ask of you, but in that case, please ask for further instructions. As the developer of this software, i can't help you if you don't provide me with the information i need.
To copy and paste text:
C-SPC
.M-w
.That will copy the text to the 'kill-ring' / 'clipboard'.
To paste text:
C-y
.Thank you for your answer.I did so as you asked and did find something new. First I created two new buffers with notepad++ and put the respective text in them and saved them. Then I opened them in emacs. Their encoding is different as shown below.
But I'm not quite sure if this difference means the db in buku is different, because I looked at the database in sqlite through the tool, and found that the Chinese are all displayed properly, and they are all in utf8 encoding.
So I guess there are two possible reasons, the first one could be that the encoding in buku is different, but it doesn't show it. The second middle possibility is that there is a problem with parsing Chinese in ebuku.
The issue seems to be that Emacs is sometimes incorrectly guessing the encoding as undecided-dos
, as in your first screenshot, rather than UTF-8. Ebuku uses Emacs' built-in call-process
to retrieve data from the buku database - refer to this part of the Ebuku code, where it calls buku
and inserts the resulting output in a temporary buffer. It's Emacs, not Ebuku, that guesses the encoding of the buffer.
Please read through this discussion on #32, in which, as i noted above, the user wasn't having problems with Chinese in Ebuku in general, but only when also using certain emoji. Emacs maintainer Eli Zaretskii is part of that discussion, and he noted that using UTF-8 on Windows machines is problematic:
[T]he user sets a UTF-8 locale, which as I wrote up-thread is not a good idea on MS-Windows. It could well cause failures in invoking external programs from Emacs, if the arguments to those programs include non-ASCII characters. In general, on MS-Windows Emacs can only safely invoke programs with non-ASCII characters in the command-line arguments if those characters can be encoded by the system codepage, in this case codepage-936 AFAIU. ... Emacs on MS-Windows cannot use UTF-8 when encoding command-line arguments for sub-programs, it can only use the system codepage. Using set-language-environment as above will force Emacs to encode command-line arguments in UTF-8, which could very well be the reason for some of these problems. ... [Setting the language environment to "UTF-8" is] NOT RECOMMENDED!
Unfortunately, that discussion wasn't resolved because the user has never responded to Eli's most recent comment. However, in this case, you've reported that the value of buffer-file-coding-system
is undecided-dos
when it comes to some of the Chinese text in your buku database, and this was some of the information Eli was seeking from the other user. So i'm going to cc him on this discussion, as he might be able to assist further.
@Eli-Zaretskii
hi, I could not get any bookmark in ebuku,but buku has import some bookmark. When I output this command "ebuku", the error message is "Invalid string for collation: Invalid argument". Here is my configuration.
OS: Windows 10.0.19045 emacs version : 30.0.50 buku: 4.9 ebuku: 2024.09.05