flexibeast / ebuku

Emacs interface to the buku Web bookmark manager.
92 stars 7 forks source link

could not get any bookmark #35

Open 1925381584 opened 2 months ago

1925381584 commented 2 months ago

hi, I could not get any bookmark in ebuku,but buku has import some bookmark. When I output this command "ebuku", the error message is "Invalid string for collation: Invalid argument". Here is my configuration.

OS: Windows 10.0.19045 emacs version : 30.0.50 buku: 4.9 ebuku: 2024.09.05

flexibeast commented 2 months ago

This reminds me of #31, which the user fixed by ensuring that the value of the LC_ALL environment variable was changed from C to the appropriate locale (in that case, zh_CN.UTF-8, as described here).

What's the value of LC_ALL in Emacs' environment on your system? You can find that information via e.g. M-x list-environment.

1925381584 commented 2 months ago

there is no a command call "list-environment" in my emacs. And I could not find the viable LC_ALL too.

flexibeast commented 2 months ago

My apologies. Can you instead please evaluate:

(seq-filter #'(lambda (v) (numberp (string-match "^LC" v))) process-environment)

and report the output?

1925381584 commented 2 months ago

the output is that

image

flexibeast commented 2 months ago

Okay, so LC_ALL is set appropriately.

Could you please do M-x toggle-debug-on-error, and then try to start Ebuku? It should result in a buffer showing what commands/functions got called; could you please share the contents of that buffer?

1925381584 commented 2 months ago

here is

image

1925381584 commented 2 months ago

this is my configure

image

flexibeast commented 2 months ago

Thank you - i'll investigate this and get back to you.

flexibeast commented 2 months ago

In the second line of the backtrace - the one that starts with #<subr string-collate-lessp ... - there are two bookmark tags being compared in order to sort them correctly. However, it appears that the tags have been saved in the buku database with different encodings; presumably the second is UTF-8, as it's rendering correctly, but the first one is showing the raw bytes (in octal), and i'm not sure what encoding it might be..

Can you please copy-and-paste the two tags into two new and separate files, each tag in their own file, and then open up each of those files in Emacs, calling C-h v buffer-file-coding-system in each buffer, and sharing the results?

1925381584 commented 2 months ago

Now I only import one bookmark,but still getting this error.

image

image

image

flexibeast commented 2 months ago

But you're importing one bookmark into a pre-existing buku database, correct? If so, then there's still the issue of comparing pre-existing tags with the tag(s) of the bookmark being imported. So, please follow the instructions i provided in my previous comment, and share the results.

1925381584 commented 2 months ago

I’m sorry I don't know how to copy-and-paste the two tags into two new and separate files. when I import the bookmark, I have clean bookmarks. After that I made the changes in the image below and it reads successfully. It looks like there is a problem parsing the Chinese language.

image

image

image

flexibeast commented 2 months ago

It's clearly not a problem with handling Chinese per se, for two reasons:

It's okay if you don't understand how to do something i ask of you, but in that case, please ask for further instructions. As the developer of this software, i can't help you if you don't provide me with the information i need.

To copy and paste text:

That will copy the text to the 'kill-ring' / 'clipboard'.

To paste text:

1925381584 commented 2 months ago

Thank you for your answer.I did so as you asked and did find something new. First I created two new buffers with notepad++ and put the respective text in them and saved them. Then I opened them in emacs. Their encoding is different as shown below.

image image image image

But I'm not quite sure if this difference means the db in buku is different, because I looked at the database in sqlite through the tool, and found that the Chinese are all displayed properly, and they are all in utf8 encoding.

image image image image

So I guess there are two possible reasons, the first one could be that the encoding in buku is different, but it doesn't show it. The second middle possibility is that there is a problem with parsing Chinese in ebuku.

flexibeast commented 2 months ago

The issue seems to be that Emacs is sometimes incorrectly guessing the encoding as undecided-dos, as in your first screenshot, rather than UTF-8. Ebuku uses Emacs' built-in call-process to retrieve data from the buku database - refer to this part of the Ebuku code, where it calls buku and inserts the resulting output in a temporary buffer. It's Emacs, not Ebuku, that guesses the encoding of the buffer.

Please read through this discussion on #32, in which, as i noted above, the user wasn't having problems with Chinese in Ebuku in general, but only when also using certain emoji. Emacs maintainer Eli Zaretskii is part of that discussion, and he noted that using UTF-8 on Windows machines is problematic:

[T]he user sets a UTF-8 locale, which as I wrote up-thread is not a good idea on MS-Windows. It could well cause failures in invoking external programs from Emacs, if the arguments to those programs include non-ASCII characters. In general, on MS-Windows Emacs can only safely invoke programs with non-ASCII characters in the command-line arguments if those characters can be encoded by the system codepage, in this case codepage-936 AFAIU. ... Emacs on MS-Windows cannot use UTF-8 when encoding command-line arguments for sub-programs, it can only use the system codepage. Using set-language-environment as above will force Emacs to encode command-line arguments in UTF-8, which could very well be the reason for some of these problems. ... [Setting the language environment to "UTF-8" is] NOT RECOMMENDED!

Unfortunately, that discussion wasn't resolved because the user has never responded to Eli's most recent comment. However, in this case, you've reported that the value of buffer-file-coding-system is undecided-dos when it comes to some of the Chinese text in your buku database, and this was some of the information Eli was seeking from the other user. So i'm going to cc him on this discussion, as he might be able to assist further.

@Eli-Zaretskii