flexibeast / ebuku

Emacs interface to the buku Web bookmark manager.
92 stars 7 forks source link

Emoji in link title results in `args-out-of-range` error #32

Open flexibeast opened 6 months ago

flexibeast commented 6 months ago

As reported by @edzhangsy in #31:

Debugger entered--Lisp error: (args-out-of-range "1884. Welcome to Comprehensive Rust 🦀 - Comprehens..." 15862 15893)
  match-string(1 "1884. Welcome to Comprehensive Rust 🦀 - Comprehensive Rust 🦀")
  ebuku--search-helper("--print" "[all]" "-1000" "")
  ebuku-show-all()
  ebuku()
  funcall-interactively(ebuku)
  command-execute(ebuku record)
  execute-extended-command(nil "ebuku" "ebuku")
  funcall-interactively(execute-extended-command nil "ebuku" "ebuku")
  command-execute(execute-extended-command)
flexibeast commented 6 months ago

i just tried adding the "Comprenhensive Rust" link ("https://google.github.io/comprehensive-rust/") to my own buku database, and then searching for it - both by searching for database entries for the "rust" tag, and by passing "rust" as an 'any' or 'all' argument. i got no errors, and the crab emoji is displayed.

@edzhangsy: What is the value of the current-language-environment variable in your Emacs? (E.g. in mine it's set to "English").

edzhangsy commented 6 months ago

My value of this value is "UTF-8". I set the value with (set-language-environment "UTF-8"). I remember I set up this value bacause I don't want my files containing Chinese to be encoded by GBK encoding.

flexibeast commented 6 months ago

Okay, thanks.

Could you please:

  1. Make sure all your Ebuku buffers are closed.
  2. Edit line 688 of ebuku.el so that instead of (concat title-line-re "\n"), it says (concat title-line-re "$").
  3. Without leaving the ebuku.el buffer, do M-x eval-buffer.
  4. Start Ebuku and check for any different behaviour. If you still get an error, please enable debug-on-error and share the backtrace.
edzhangsy commented 6 months ago

I followed your instruction and get this trace.

Debugger entered--Lisp error: (args-out-of-range "2027. Taking org-roam everywhere with logseq • Core Dumped" 32318 32355)
  match-string(1 "2027. Taking org-roam everywhere with logseq • Cor...")
  (setq tags (match-string 1 line))
  (progn (string-match "^\\s-*[#] \\(.*\\)$" line) (setq tags (match-string 1 line)))
  (if (string-match "^\\s-+[+]" line) (let ((start (line-beginning-position))) (progn (re-search-forward (concat "\\(" "^\\s-+[#]" "\\|" title-line-re "\\|" "\\'" "\\)")) (beginning-of-line) (let* ((end (point)) (comment-string (buffer-substring start (- end 2)))) (setq comment (progn (string-match "^\\s-+[+] " comment-string) (substring comment-string (match-end 0) nil)))) (let ((line (buffer-substring (line-beginning-position) (line-end-position)))) (cond ((string-match "^\\s-+[#] \\(.*\\)$" line) (setq tags (match-string 1 line))) ((string-match title-line-re line) (forward-line -1)))))) (progn (string-match "^\\s-*[#] \\(.*\\)$" line) (setq tags (match-string 1 line))))
  (if (not (string= "" line)) (if (string-match "^\\s-+[+]" line) (let ((start (line-beginning-position))) (progn (re-search-forward (concat "\\(" "^\\s-+[#]" "\\|" title-line-re "\\|" "\\'" "\\)")) (beginning-of-line) (let* ((end (point)) (comment-string (buffer-substring start ...))) (setq comment (progn (string-match "^\\s-+[+] " comment-string) (substring comment-string ... nil)))) (let ((line (buffer-substring ... ...))) (cond ((string-match "^\\s-+[#] \\(.*\\)$" line) (setq tags ...)) ((string-match title-line-re line) (forward-line -1)))))) (progn (string-match "^\\s-*[#] \\(.*\\)$" line) (setq tags (match-string 1 line)))))
  (let ((line (buffer-substring (line-beginning-position) (line-end-position)))) (if (not (string= "" line)) (if (string-match "^\\s-+[+]" line) (let ((start (line-beginning-position))) (progn (re-search-forward (concat "\\(" "^\\s-+[#]" "\\|" title-line-re "\\|" "\\'" "\\)")) (beginning-of-line) (let* ((end ...) (comment-string ...)) (setq comment (progn ... ...))) (let ((line ...)) (cond (... ...) (... ...))))) (progn (string-match "^\\s-*[#] \\(.*\\)$" line) (setq tags (match-string 1 line))))))
  (while (re-search-forward (concat title-line-re "$") nil t) (if (string= "--print" type) (progn (setq index (match-string 1)) (setq title (match-string 2))) (progn (setq title (match-string 2)) (setq index (match-string 3)))) (re-search-forward "^\\s-+> \\([^\n]+\\)") (setq url (match-string 1)) (forward-line) (let ((line (buffer-substring (line-beginning-position) (line-end-position)))) (if (not (string= "" line)) (if (string-match "^\\s-+[+]" line) (let ((start (line-beginning-position))) (progn (re-search-forward (concat "\\(" "^\\s-+[#]" "\\|" title-line-re "\\|" "\\'" "\\)")) (beginning-of-line) (let* (... ...) (setq comment ...)) (let (...) (cond ... ...)))) (progn (string-match "^\\s-*[#] \\(.*\\)$" line) (setq tags (match-string 1 line)))))) (save-current-buffer (set-buffer "*Ebuku*") (let ((inhibit-read-only t)) (insert (propertize "  --  " 'buku-index index 'help-echo index) (propertize title 'buku-index index 'data title 'face 'ebuku-title-face) (propertize "\n" 'buku-index index) (propertize "      " 'buku-index index) (propertize url 'buku-index index 'data url 'face 'ebuku-url-face 'mouse-face 'ebuku-url-highlight-face 'help-echo "mouse-1: open link in browser") (propertize "\n" 'buku-index index)) (if (string= "" comment) nil (setq comment (replace-regexp-in-string "\n" "\n      " comment)) (insert (propertize "      " 'buku-index index) (propertize comment 'buku-index index 'data comment 'face 'ebuku-comment-face) (propertize "\n" 'buku-index index))) (if (string= "" tags) nil (insert (propertize "      " 'buku-index index) (propertize tags 'buku-index index 'data tags 'face 'ebuku-tags-face) (propertize "\n" 'buku-index index))) (insert "\n"))) (progn (setq comment "") (setq tags "")))
  (if (string= "0" count) nil (while (re-search-forward (concat title-line-re "$") nil t) (if (string= "--print" type) (progn (setq index (match-string 1)) (setq title (match-string 2))) (progn (setq title (match-string 2)) (setq index (match-string 3)))) (re-search-forward "^\\s-+> \\([^\n]+\\)") (setq url (match-string 1)) (forward-line) (let ((line (buffer-substring (line-beginning-position) (line-end-position)))) (if (not (string= "" line)) (if (string-match "^\\s-+[+]" line) (let ((start ...)) (progn (re-search-forward ...) (beginning-of-line) (let* ... ...) (let ... ...))) (progn (string-match "^\\s-*[#] \\(.*\\)$" line) (setq tags (match-string 1 line)))))) (save-current-buffer (set-buffer "*Ebuku*") (let ((inhibit-read-only t)) (insert (propertize "  --  " 'buku-index index 'help-echo index) (propertize title 'buku-index index 'data title 'face 'ebuku-title-face) (propertize "\n" 'buku-index index) (propertize "      " 'buku-index index) (propertize url 'buku-index index 'data url 'face 'ebuku-url-face 'mouse-face 'ebuku-url-highlight-face 'help-echo "mouse-1: open link in browser") (propertize "\n" 'buku-index index)) (if (string= "" comment) nil (setq comment (replace-regexp-in-string "\n" "\n      " comment)) (insert (propertize "      " 'buku-index index) (propertize comment 'buku-index index 'data comment 'face 'ebuku-comment-face) (propertize "\n" 'buku-index index))) (if (string= "" tags) nil (insert (propertize "      " 'buku-index index) (propertize tags 'buku-index index 'data tags 'face 'ebuku-tags-face) (propertize "\n" 'buku-index index))) (insert "\n"))) (progn (setq comment "") (setq tags ""))) (save-current-buffer (set-buffer "*Ebuku*") (progn (if current-index (progn (goto-char (point-min)) (let ((prop-match ...)) (if prop-match (goto-char ...) (cond ... ...)))) (ebuku--goto-line first-result-line)) (if (eq (window-buffer) (current-buffer)) (recenter)))))
  (progn (if (string= "" exclude) (ebuku--call-buku (list type term)) (ebuku--call-buku (list type term "--exclude" exclude))) (setq ebuku--last-search (list type prompt term exclude)) (if (string= "--print" type) (cond ((string= "[index]" prompt) (setq count "1")) ((string= "[recent]" prompt) (setq count (number-to-string ebuku-recent-count))) ((string= "[all]" prompt) (setq count (ebuku--get-bookmark-count)))) (if (re-search-backward "^\\([[:digit:]]+\\)\\." nil t) (setq count (match-string 1)) (setq count "0"))) (goto-char (point-min)) (save-current-buffer (set-buffer "*Ebuku*") (let ((inhibit-read-only t)) (ebuku--goto-line ebuku--results-start) (beginning-of-line) (delete-region (point) (point-max)) (cond ((string= "0" count) (insert (concat "  No results found for '" search "'.\n\n"))) ((string= "1" count) (insert (concat "  Found 1 result for '" search "'.\n\n"))) (t (progn (if (or ... ...) (insert ...) (if ... ...))))))) (if (string= "0" count) nil (while (re-search-forward (concat title-line-re "$") nil t) (if (string= "--print" type) (progn (setq index (match-string 1)) (setq title (match-string 2))) (progn (setq title (match-string 2)) (setq index (match-string 3)))) (re-search-forward "^\\s-+> \\([^\n]+\\)") (setq url (match-string 1)) (forward-line) (let ((line (buffer-substring (line-beginning-position) (line-end-position)))) (if (not (string= "" line)) (if (string-match "^\\s-+[+]" line) (let (...) (progn ... ... ... ...)) (progn (string-match "^\\s-*[#] \\(.*\\)$" line) (setq tags ...))))) (save-current-buffer (set-buffer "*Ebuku*") (let ((inhibit-read-only t)) (insert (propertize "  --  " 'buku-index index 'help-echo index) (propertize title 'buku-index index 'data title 'face 'ebuku-title-face) (propertize "\n" 'buku-index index) (propertize "      " 'buku-index index) (propertize url 'buku-index index 'data url 'face 'ebuku-url-face 'mouse-face 'ebuku-url-highlight-face 'help-echo "mouse-1: open link in browser") (propertize "\n" 'buku-index index)) (if (string= "" comment) nil (setq comment (replace-regexp-in-string "\n" "\n      " comment)) (insert (propertize "      " ... index) (propertize comment ... index ... comment ... ...) (propertize "\n" ... index))) (if (string= "" tags) nil (insert (propertize "      " ... index) (propertize tags ... index ... tags ... ...) (propertize "\n" ... index))) (insert "\n"))) (progn (setq comment "") (setq tags ""))) (save-current-buffer (set-buffer "*Ebuku*") (progn (if current-index (progn (goto-char (point-min)) (let (...) (if prop-match ... ...))) (ebuku--goto-line first-result-line)) (if (eq (window-buffer) (current-buffer)) (recenter))))))
  (unwind-protect (progn (if (string= "" exclude) (ebuku--call-buku (list type term)) (ebuku--call-buku (list type term "--exclude" exclude))) (setq ebuku--last-search (list type prompt term exclude)) (if (string= "--print" type) (cond ((string= "[index]" prompt) (setq count "1")) ((string= "[recent]" prompt) (setq count (number-to-string ebuku-recent-count))) ((string= "[all]" prompt) (setq count (ebuku--get-bookmark-count)))) (if (re-search-backward "^\\([[:digit:]]+\\)\\." nil t) (setq count (match-string 1)) (setq count "0"))) (goto-char (point-min)) (save-current-buffer (set-buffer "*Ebuku*") (let ((inhibit-read-only t)) (ebuku--goto-line ebuku--results-start) (beginning-of-line) (delete-region (point) (point-max)) (cond ((string= "0" count) (insert (concat "  No results found for '" search "'.\n\n"))) ((string= "1" count) (insert (concat "  Found 1 result for '" search "'.\n\n"))) (t (progn (if ... ... ...)))))) (if (string= "0" count) nil (while (re-search-forward (concat title-line-re "$") nil t) (if (string= "--print" type) (progn (setq index (match-string 1)) (setq title (match-string 2))) (progn (setq title (match-string 2)) (setq index (match-string 3)))) (re-search-forward "^\\s-+> \\([^\n]+\\)") (setq url (match-string 1)) (forward-line) (let ((line (buffer-substring ... ...))) (if (not (string= "" line)) (if (string-match "^\\s-+[+]" line) (let ... ...) (progn ... ...)))) (save-current-buffer (set-buffer "*Ebuku*") (let ((inhibit-read-only t)) (insert (propertize "  --  " ... index ... index) (propertize title ... index ... title ... ...) (propertize "\n" ... index) (propertize "      " ... index) (propertize url ... index ... url ... ... ... ... ... "mouse-1: open link in browser") (propertize "\n" ... index)) (if (string= "" comment) nil (setq comment ...) (insert ... ... ...)) (if (string= "" tags) nil (insert ... ... ...)) (insert "\n"))) (progn (setq comment "") (setq tags ""))) (save-current-buffer (set-buffer "*Ebuku*") (progn (if current-index (progn (goto-char ...) (let ... ...)) (ebuku--goto-line first-result-line)) (if (eq (window-buffer) (current-buffer)) (recenter)))))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer)))
  (save-current-buffer (set-buffer temp-buffer) (unwind-protect (progn (if (string= "" exclude) (ebuku--call-buku (list type term)) (ebuku--call-buku (list type term "--exclude" exclude))) (setq ebuku--last-search (list type prompt term exclude)) (if (string= "--print" type) (cond ((string= "[index]" prompt) (setq count "1")) ((string= "[recent]" prompt) (setq count (number-to-string ebuku-recent-count))) ((string= "[all]" prompt) (setq count (ebuku--get-bookmark-count)))) (if (re-search-backward "^\\([[:digit:]]+\\)\\." nil t) (setq count (match-string 1)) (setq count "0"))) (goto-char (point-min)) (save-current-buffer (set-buffer "*Ebuku*") (let ((inhibit-read-only t)) (ebuku--goto-line ebuku--results-start) (beginning-of-line) (delete-region (point) (point-max)) (cond ((string= "0" count) (insert ...)) ((string= "1" count) (insert ...)) (t (progn ...))))) (if (string= "0" count) nil (while (re-search-forward (concat title-line-re "$") nil t) (if (string= "--print" type) (progn (setq index ...) (setq title ...)) (progn (setq title ...) (setq index ...))) (re-search-forward "^\\s-+> \\([^\n]+\\)") (setq url (match-string 1)) (forward-line) (let ((line ...)) (if (not ...) (if ... ... ...))) (save-current-buffer (set-buffer "*Ebuku*") (let (...) (insert ... ... ... ... ... ...) (if ... nil ... ...) (if ... nil ...) (insert "\n"))) (progn (setq comment "") (setq tags ""))) (save-current-buffer (set-buffer "*Ebuku*") (progn (if current-index (progn ... ...) (ebuku--goto-line first-result-line)) (if (eq ... ...) (recenter)))))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer))))
  (let ((temp-buffer (generate-new-buffer " *temp*" t))) (save-current-buffer (set-buffer temp-buffer) (unwind-protect (progn (if (string= "" exclude) (ebuku--call-buku (list type term)) (ebuku--call-buku (list type term "--exclude" exclude))) (setq ebuku--last-search (list type prompt term exclude)) (if (string= "--print" type) (cond ((string= "[index]" prompt) (setq count "1")) ((string= "[recent]" prompt) (setq count ...)) ((string= "[all]" prompt) (setq count ...))) (if (re-search-backward "^\\([[:digit:]]+\\)\\." nil t) (setq count (match-string 1)) (setq count "0"))) (goto-char (point-min)) (save-current-buffer (set-buffer "*Ebuku*") (let ((inhibit-read-only t)) (ebuku--goto-line ebuku--results-start) (beginning-of-line) (delete-region (point) (point-max)) (cond (... ...) (... ...) (t ...)))) (if (string= "0" count) nil (while (re-search-forward (concat title-line-re "$") nil t) (if (string= "--print" type) (progn ... ...) (progn ... ...)) (re-search-forward "^\\s-+> \\([^\n]+\\)") (setq url (match-string 1)) (forward-line) (let (...) (if ... ...)) (save-current-buffer (set-buffer "*Ebuku*") (let ... ... ... ... ...)) (progn (setq comment "") (setq tags ""))) (save-current-buffer (set-buffer "*Ebuku*") (progn (if current-index ... ...) (if ... ...))))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer)))))
  (let* ((count "0") (term (if term term (read-from-minibuffer prompt))) (exclude (if exclude exclude (if (string= "[index]" prompt) "" (read-from-minibuffer "Exclude keywords? ")))) (search (concat type " " term (if (not (string= "" exclude)) (concat " --exclude " exclude)))) (title-line-re (concat "^\\([[:digit:]]+\\)\\. " "\\(.+?\\)" "\\(?: \\[\\([[:digit:]]+\\)\\]\\)?")) (title "") (index "") (url "") (comment "") (tags "") (current-index (or ebuku--new-index (ebuku--get-index-at-point))) (previous-index (let ((pos (previous-single-property-change (point) 'buku-index))) (if pos (get-char-property (- pos 2) 'buku-index) nil))) (next-index (let ((pos (next-single-property-change (point) 'buku-index))) (if pos (get-char-property (1+ pos) 'buku-index) nil))) (first-result-line (+ ebuku--results-start 2))) (setq ebuku--new-index nil) (let ((temp-buffer (generate-new-buffer " *temp*" t))) (save-current-buffer (set-buffer temp-buffer) (unwind-protect (progn (if (string= "" exclude) (ebuku--call-buku (list type term)) (ebuku--call-buku (list type term "--exclude" exclude))) (setq ebuku--last-search (list type prompt term exclude)) (if (string= "--print" type) (cond (... ...) (... ...) (... ...)) (if (re-search-backward "^\\([[:digit:]]+\\)\\." nil t) (setq count ...) (setq count "0"))) (goto-char (point-min)) (save-current-buffer (set-buffer "*Ebuku*") (let (...) (ebuku--goto-line ebuku--results-start) (beginning-of-line) (delete-region ... ...) (cond ... ... ...))) (if (string= "0" count) nil (while (re-search-forward ... nil t) (if ... ... ...) (re-search-forward "^\\s-+> \\([^\n]+\\)") (setq url ...) (forward-line) (let ... ...) (save-current-buffer ... ...) (progn ... ...)) (save-current-buffer (set-buffer "*Ebuku*") (progn ... ...)))) (and (buffer-name temp-buffer) (kill-buffer temp-buffer))))))
  ebuku--search-helper("--print" "[all]" "-1000" "")
  ebuku-show-all()
  (cond ((eq 'all ebuku-display-on-startup) (ebuku-show-all)) ((eq 'recent ebuku-display-on-startup) (ebuku-search-on-recent)) ((eq nil ebuku-display-on-startup) (insert "  [ Please specify a search, or press 'r' for rece...")))
  (save-current-buffer (set-buffer (generate-new-buffer "*Ebuku*")) (goto-char (point-min)) (insert "\n") (insert (propertize " Ebuku\n" 'face 'ebuku-heading-face)) (insert (propertize "  ----------\n\n" 'face 'ebuku-separator-face)) (setq ebuku--results-start (line-number-at-pos)) (cond ((eq 'all ebuku-display-on-startup) (ebuku-show-all)) ((eq 'recent ebuku-display-on-startup) (ebuku-search-on-recent)) ((eq nil ebuku-display-on-startup) (insert "  [ Please specify a search, or press 'r' for rece..."))) (ebuku--goto-line ebuku--results-start) (add-text-properties (point-min) (point) '(read-only t intangible t)) (forward-line 2) (ebuku--create-mode-menu) (setq header-line-format nil) (ebuku-mode))
  (progn (ebuku-update-tags-cache) (setq ebuku--last-search nil) (save-current-buffer (set-buffer (generate-new-buffer "*Ebuku*")) (goto-char (point-min)) (insert "\n") (insert (propertize " Ebuku\n" 'face 'ebuku-heading-face)) (insert (propertize "  ----------\n\n" 'face 'ebuku-separator-face)) (setq ebuku--results-start (line-number-at-pos)) (cond ((eq 'all ebuku-display-on-startup) (ebuku-show-all)) ((eq 'recent ebuku-display-on-startup) (ebuku-search-on-recent)) ((eq nil ebuku-display-on-startup) (insert "  [ Please specify a search, or press 'r' for rece..."))) (ebuku--goto-line ebuku--results-start) (add-text-properties (point-min) (point) '(read-only t intangible t)) (forward-line 2) (ebuku--create-mode-menu) (setq header-line-format nil) (ebuku-mode)) (switch-to-buffer "*Ebuku*"))
  (if (get-buffer "*Ebuku*") (switch-to-buffer "*Ebuku*") (progn (ebuku-update-tags-cache) (setq ebuku--last-search nil) (save-current-buffer (set-buffer (generate-new-buffer "*Ebuku*")) (goto-char (point-min)) (insert "\n") (insert (propertize " Ebuku\n" 'face 'ebuku-heading-face)) (insert (propertize "  ----------\n\n" 'face 'ebuku-separator-face)) (setq ebuku--results-start (line-number-at-pos)) (cond ((eq 'all ebuku-display-on-startup) (ebuku-show-all)) ((eq 'recent ebuku-display-on-startup) (ebuku-search-on-recent)) ((eq nil ebuku-display-on-startup) (insert "  [ Please specify a search, or press 'r' for rece..."))) (ebuku--goto-line ebuku--results-start) (add-text-properties (point-min) (point) '(read-only t intangible t)) (forward-line 2) (ebuku--create-mode-menu) (setq header-line-format nil) (ebuku-mode)) (switch-to-buffer "*Ebuku*")))
  ebuku()
  funcall-interactively(ebuku)
  command-execute(ebuku record)
  execute-extended-command(nil "ebuku" "ebuku")
  funcall-interactively(execute-extended-command nil "ebuku" "ebuku")
  command-execute(execute-extended-command)

The link causing trouble is this one https://coredumped.dev/2021/05/26/taking-org-roam-everywhere-with-logseq/

flexibeast commented 6 months ago

"Core dumped" is an unfortunate blog name in the context of a backtrace. :-) At first i thought Ebuku was somehow causing Emacs to dump core ....

The code change in my previous comment was to take into consideration that the line ending on Windows is CR+LF / \r\n, rather than just LF / \n. Which is what i should have had anyway.

However, adding and searching for the "Core Dumped" bookmark resulted in no errors for me. But the difference between the two values in the "args-out-of-range" error is quite different to the one for the "Comprehensive Rust" link; in that case, the difference was 31, in this case it's 37. And the Unicode BULLET grapheme requires 3 bytes in UTF-8, whereas the CRAB grapheme requires 4 bytes, so i would have expected the difference between values in the "args-out-of-range" error value would be bigger in the CRAB case than the BULLET case.

All that said, it occurs to me that, if i remember correctly, the default encoding used by Windows is UTF-16, not UTF-8. So i'm wondering if that's somehow being used to transfer data from the buku process to the Emacs process, regardless of the value of LANG and LC_ALL, and regardless of the encoding of the buku database itself? On my machine, file(1) reports the database as UTF-8:

$ file ~/.local/share/buku/bookmarks.db
~/.local/share/buku/bookmarks.db: SQLite 3.x database, last written using SQLite version 3045001, file counter 21809, database pages 3426, cookie 0x1, schema 4, UTF-8, version-valid-for 21809

Could you please share the value of the locale-coding-system and default-process-coding-system variables?

edzhangsy commented 6 months ago

My locale-coding-system is cp936. My default-process-coding-system is `(utf-8-dos . utf-8-unix).

The name of the bookmark is frightenning indeed. Sorry for that:-) I also checked the db.

.\bookmarks.db: SQLite 3.x database, last written using SQLite version 3043001, file counter 32, database pages 129, cookie 0x1, schema 4, UTF-8, version-valid-for 32

I think the Powershell will use UTF-16 to encode instead of UTF-8.

flexibeast commented 6 months ago

*nod* Well, at this point, as a non-Windows user who doesn't have access to a Windows machine, i don't know what things i can get you to try/test in order to diagnose the problem. So i'm going to send an email about this issue to the emacs-devel list. For that email, can you please tell me:

edzhangsy commented 6 months ago

Thank you very much. I am running the Emacs 29.2 on Windows 11 (10.0.22631) And I don't use distro of emacs. I install the emacs using scoop package manager. Maybe I should switch to WSL version of Emacs.

flexibeast commented 6 months ago

Okay, so, i've got some helpful responses from Eli Zaretskii. If you're not already aware, Eli is not only an Emacs maintainer, but is also very familiar with text encoding in general and Unicode in particular. Additionally, if i remember correctly, he's a Windows user himself.

The discussion so far is available online here. (Although please note that for some reason my initial email got its formatting mangled upon sending; a less-mangled version is available here.)

The first thing to note is that Eli wrote:

[T]he user sets a UTF-8 locale, which as I wrote up-thread is not a good idea on MS-Windows. It could well cause failures in invoking external programs from Emacs, if the arguments to those programs include non-ASCII characters. In general, on MS-Windows Emacs can only safely invoke programs with non-ASCII characters in the command-line arguments if those characters can be encoded by the system codepage, in this case codepage-936 AFAIU. ... Emacs on MS-Windows cannot use UTF-8 when encoding command-line arguments for sub-programs, it can only use the system codepage. Using set-language-environment as above will force Emacs to encode command-line arguments in UTF-8, which could very well be the reason for some of these problems. ... [Setting the language environment to "UTF-8" is] NOT RECOMMENDED!

So: please remove (set-language-environment "UTF-8") from your setup as we try to resolve this issue.

Secondly, Eli wrote:

The Windows 'setlocale' supports only LC_ categories in direct calls to the function, and doesn't consider the corresponding environment variables. The Emacs source code doesn't reference LC_ environment variables on MS-Windows, either. So how did the user set LC_ALL, and why did it have any effect whatsoever on the issue?

Could you please describe how you set LC_ALL?

Thirdly, Eli wrote:

[T]he issues with Windows-style file names with drive letters and with file names that begin with "~" lead me to believe that perhaps the underlying program 'buku' is not a native Windows program, but a Cygwin or MSYS program, in which case there could be incompatibilities both regarding file names and regarding handling of non-ASCII characters (Cygwin and MSYS use UTF-8 by default, whereas the native Windows build of Emacs does not).

i've mentioned that buku is a Python program, but we now need to check what buku itself does, without any interaction with Ebuku:

  1. Can you successfully add the "Comprehensive Rust" bookmark to buku directly?
  2. Can you successfully search for that bookmark with buku?
  3. If the answer to 2 is "yes": Does PowerShell correctly display the CRAB emoji in buku's output?
  4. If the answer to 3 is "yes": If you copy and paste that output into Emacs, is the CRAB emoji correctly displayed in Emacs?
edzhangsy commented 6 months ago

Great help from you and Eli, thank you. Now I removed the (set-language-environment "UTF-8"). But right now my language environment will be "Chinese-GBK". My files will be encoded as GBK, which is not I desired. I think for some backward compatibility concern, MS use GBK encofing for Chinese. Can I somehow set the Emacs to use UTF-8 for new file encodings?

I set the LC_ALL and LANG variable by editing Windows's environment variable. Because Emacs don't read these, I have removed the variables.

  1. I can add the "comprehensive rust" bookmark to buku directly.
  2. I can search the bookmark with buku.
PixPin_2024-04-18_10-06-12
  1. The PowerShell display the CRAB emoji fine. ( I use Windows Terminal )
  2. I copied the output into scratch buffer, it's not displaying.

image

edzhangsy commented 6 months ago

Hi, I found the encoding setting from Emacs China website. I added these lines in the early-init.el file

(set-charset-priority 'unicode)
(prefer-coding-system 'utf-8)
(setq system-time-locale "C")

Right now, the new file would be saved with UTF-8 But the args out of range problems still persistes.

flexibeast commented 6 months ago

the args out of range problems still persistes.

i wouldn't expect it to, as setting those variables won't influence the encoding of the data that Ebuku has to process.

This is a very complex issue, so we need to control the various factors involved. This is why i wrote:

please remove (set-language-environment "UTF-8") from your setup as we try to resolve this issue.

By that, i meant: When you're testing out things as we work on this problem, please don't have the Emacs language environment set to UTF-8, as it will complicate interactions with Windows in general and buku in particular. i understand that you don't want to use the GBK environment in general

Unfortunately, making the configuration changes you described in your previous comment here only adds more factors to consider, and makes it more difficult to understand what's happening on your system.

So, when you test out things in Emacs, i'd like you to do so by starting Emacs with the -Q option, and loading Ebuku manually, without any manually-specified Emacs configuration, so we can try to work what's happening when Emacs interacts with Windows and buku. In other words, once you've started Emacs with the -Q option, in Emacs' *scratch* buffer, you'd evaluate:

(load-file "/path/to/ebuku.el")
(require 'ebuku)

then do any necessary Ebuku-related setup (e.g. setting the path to the buku database), and then try things with Ebuku.

i'm think i'm going to have to ask you to interact directly with the Eli on the mailing list about this, as i'm finding it difficult to be the messenger going back and forth, and it will be much quicker if Eli can ask you questions directly, which you can respond to directly. Hopefully that process will make it clear what would need to be done by Ebuku in order to fix the problem, in a non-GBK environment.

Could you please let me know an appropriate email address i can use to add you to the discussion on emacs-devel?

Eli-Zaretskii commented 6 months ago

Great help from you and Eli, thank you. Now I removed the (set-language-environment "UTF-8"). But right now my language environment will be "Chinese-GBK". My files will be encoded as GBK, which is not I desired. I think for some backward compatibility concern, MS use GBK encofing for Chinese. Can I somehow set the Emacs to use UTF-8 for new file encodings?

I set the LC_ALL and LANG variable by editing Windows's environment variable. Because Emacs don't read these, I have removed the variables.

1. I can add the "comprehensive rust" bookmark to buku directly.

2. I can search the bookmark with buku.
PixPin_2024-04-18_10-06-12
3. The PowerShell display the CRAB emoji fine. ( I use Windows Terminal )

4. I copied the output into scratch buffer, it's not displaying.

image

Does this mean that without setting your language-environment to UTF-8, all the problems with ebuku are resolved?

If the only problems left are how to make your files encoded in UTF-8, that can be solved in other ways, so I suggest to focus on the ebuku problem for now, and revisit the other issues with GBK later.

Eli-Zaretskii commented 6 months ago

There's no need to do the interaction through the mailing list, we can do it here. It will be easier and faster.

edzhangsy commented 6 months ago

The problem with ebuku is not solved yet. Let's focus on this problem first. I removed the utf-8 settings. Right now my language environment is "Chinese-GBK". Ebuku will interact with the buku program outside emacs. The buku program can handle the crab emoji (both add the bookmark with crab emoji and display it in the terminal). But when the ebuku want to interact with buku and want to display the bookmarks in the buku program, there will be an args-out-range problem. Also, I copied the crab emoji from terminal to the emacs and paste in the emacs buffer. It can't display the crab emoji properly. As shown in the screenshot, it's displayed as a box.

Eli-Zaretskii commented 6 months ago

OK. Please try this: modify ebuku.el such that each time it calls call-process, Emacs binds coding-system-for-read to utf-8. Like this:

                 (let ((coding-system-for-read 'utf-8))
                    (call-process ...

Then use the modified ebuku.el (byte-compile it before use), and see if the problem is resolved or not.

edzhangsy commented 6 months ago

I am new to the emacs, so I modified the code like this

(defun ebuku--call-buku (args)
  "Internal function for calling 'buku' with list ARGS."
  (unless ebuku-buku-path
    (error "Couldn't find buku: check 'ebuku-buku-path'"))
  (let ((coding-system-for-read 'utf-8))
  (apply #'call-process
         `(,ebuku-buku-path nil t nil
                            "--np" "--nc"
                            "--db" ,ebuku-database-path
                            ,@args))))

Is it correct? I then use Alexis' suggestion, open emacs -Q, whose language-environment is also "Chinese-GBK". In a buffer with these contents:

(load-file "C:/Users/uname/.emacs.d/straight/build/ebuku/ebuku.el")
(require 'ebuku)
(setq ebuku-buku-path "C:/Users/uname/.local/bin/buku.exe")
(setq ebuku-database-path "C:/Users/uname/.local/share/buku/bookmarks.db")

I eval the buffer with eval-buffer and call ebuku. The problem persists.

Eli-Zaretskii commented 6 months ago

But that is not the only call to call-process in the package, there are at least 2 or 3 more. You need to make such a change in all of them, not just in one.

edzhangsy commented 6 months ago

I edited the other two like this

    (let ((inhibit-message t)) ((coding-system-for-read 'utf-8))
      (with-temp-buffer
        ;; Use 'US', the Unit Separator character, to separate bookmark fields.
        (call-process ebuku-sqlite-path
                      nil t nil
                      "-separator" "\037"
                      ebuku-database-path
                      "select * from bookmarks;")

And the error persists.

Eli-Zaretskii commented 6 months ago

OK. So now I need you to step with Edebug through ebuku--search-helper (which as far as I understand is where the error happens) after you invoke the command that causes the problem, and tell:

The above assumes that the error you see after all those changes is still the same, i.e.:

Debugger entered--Lisp error: (args-out-of-range "1884. Welcome to Comprehensive Rust 🦀 - Comprehens..." 15862 15893) match-string(1 "1884. Welcome to Comprehensive Rust 🦀 - Comprehensive Rust 🦀") ebuku--search-helper("--print" "[all]" "-1000" "") ebuku-show-all() ebuku() funcall-interactively(ebuku) command-execute(ebuku record) execute-extended-command(nil "ebuku" "ebuku") funcall-interactively(execute-extended-command nil "ebuku" "ebuku") command-execute(execute-extended-command)

edzhangsy commented 6 months ago

I tried to execute C-u C-M-x at the ebuku-search-helper. I press p to view the temp buffer, but it soon dispear. How can I keep the temp buffer open in a new window? I accidentally get a trace when debuging on the ebuku--search-helper and saved it. ebuku-trace.txt Hope this is helpful.

Eli-Zaretskii commented 6 months ago

Unfortunately, you need to manually switch to the temp buffer each time after any Edebug command. The trace is helpful, but we still need the information I asked for. In particular, we need to see the complete text of the buffer, to understand what does the string-match calls find there and why does match-string fail. The value of buffer-file-coding-system in the temporary buffer is also important.