estin / simple-completion-language-server

Language server to enable word completion and snippets for Helix editor
MIT License
224 stars 14 forks source link

Maybe add completion of citation keys for `.bib` file in Markdown #78

Closed lukeflo closed 2 months ago

lukeflo commented 3 months ago

Not sure, how much effort that would be. But at the moment, neither Markdown LSP nor Helix itself seems to support auto completion of citation keys when writing papers/notes in Markdown.

Thus, this would be a great feature.

E.g. if the YAML header of a md/qmd file contains the line bibliography: /path/to/bibfile.bib, to be able to trigger auto completion of the included bibentries when typing @... with previewing the related entry of the bibfile.

estin commented 3 months ago

Hi! It's not difficult to solve.

lukeflo commented 3 months ago

Hi,

thanks for the response. I'll provide the files tomorrow if possible.

Best

lukeflo commented 3 months ago

A typical Markdown file could look like that (doesn't matter if the file extension is md or qmd), whereas the filepath to the .bib file can be absolute or relative:

---
author: lukeflo
date: 2024-08-13
keywords: [tag1]
title: A simple note
pdf-engine: lualatex
cite-method: biblatex
bibliography: "~/Documents/notes-db/test.bib" # could also be surrounded by brackets instead of quotation marks
---

# Heading

incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

::: {#refs}
:::

The corresponding .bib file can contain multiple entries in the BibTex format. For example, a test.bib file:

@online{irfanullah_open_acces_global_south_2021,
    author = {Irfanullah, Haseeb},
    title = {{Open Access and Global South}},
    subtitle = {It is More Than a Matter of Inclusion},
    date = {2021-02-08},
    urldate = {2024-08-04},
    language = {english},
    url = {https://web.archive.org/web/20240303223926/https://scholarlykitchen.sspnet.org/2021/01/28/open-access-and-global-south-it-is-more-than-a-matter-of-inclusion/},
}

@article{brainard_pay-to-publ_model_open_acces_2024,
    author = {Brainard, Jeffrey},
    title = {{Is the pay-to-publish model for open access pricing scientists
             out?}},
    journal = {American Association for the Advancement of Science},
    volume = {385},
    issue = {6708},
    date = {2024-08-01},
    urldate = {2024-08-04},
    doi = {10.1126/science.zp80ua9},
}

@article{brembs_replacing_academic_journals_2023,
    author = {Brembs, Björn and Huneman, Philippe and Schönbrodt, Felix and
              Nilsonne, Gustav and Susi, Toma and Siems, Renke and Perakakis,
              Pandelis and Trachana, Varvara and Ma, Lai and Rodriguez-Cuadrado,
              Sara},
    title = {Replacing academic journals},
    year = {2023},
    month = may,
    doi = {10.5281/zenodo.7974116},
}

Generally, every Markdown file has its own header with the particular bibliography: keyword. The given path can be surrounded by quotes ", bracktes [...] or just be plain (More infos).

How it should work:

When writing in the body of a Markdown file like the one above, typing @ at the beginning of a new word (this means after a space character) should trigger autocompletion of the citekeys from the given .bib file:

As an example, how it should act, take a look at the Gif from the PandocCiter for VSCode.

estin commented 3 months ago

@lukeflo please try first attempt of this feature

branch https://github.com/estin/simple-completion-language-server/tree/citation-keys

$ cargo install  --branch citation-keys --git https://github.com/estin/simple-completion-language-server.git

and enable this feature in languages.toml

[language-server.scls]
command = "simple-completion-language-server"

[language-server.scls.config]
max_completion_items = 20
snippets_first = true
feature_words = true
feature_snippets = true
feature_unicode_input = true
feature_citations = true # enable it <--
Use case video https://github.com/user-attachments/assets/d11109b5-e2da-4671-aacb-c250c4a1517c

How it works

Currently it works and I think useful as is

Sorry for my poor English

lukeflo commented 3 months ago

Great. I'll try it ASAP and come back to you with some feedback. Don't know if I've time today, but will try.

PS: Your English is fine. I'm also not a native speaker... 😉

lukeflo commented 3 months ago

Hey, just was able to test it out. One good news, one bad (unfortunately, the bad one is much more relevant):

First the good thing:

If I test it with a very small example file, as you did in the video, it works great. E.g. with a file:

---
bibliography: "/home/lukeflo/Documents/notes-db/literatur-lukeflo.bib
---

Test @bra...

@bra triggers the autocompletion!

But:

If I try it in a larger file, containing a relevant yaml header and some paragraphs etc., it doesn't work out. E.g. a file like the following wont work, and `[@bra...] wont trigger autocompletion:

---
date: 2023-11-11
title: FAIR Principles
bibliography: "/home/lukeflo/Documents/notes-db/literatur-lukeflo.bib"
---

# FAIR data

FAIR Guiding Principles should be applied to the workflows too:[@bra...]

> "Importantly, it is our intent that the principles apply not only to
> 'data' in the conventional sense, but also to the algorithms, tools,
> and workflows that led to that data."

I tried out different dirs, relative to my bib file or in other dirs, but the only aspect seems to be the content of the Markdown file itself.

estin commented 3 months ago

@lukeflo please try to debug it. can't reproduce bug

1) Install new version of scls from related branch (added more logs)

$ cargo install  --branch citation-keys --git https://github.com/estin/simple-completion-language-server.git

2) Ensure scls configured for logging to file /tmp/completion.log

[language-server.scls]
command = "simple-completion-language-server"

[language-server.scls.config]
max_completion_items = 20
snippets_first = true
feature_words = true
feature_snippets = true
feature_unicode_input = true
feature_citations = true # enable it

# write logs to /tmp/completion.log
[language-server.scls.environment]
RUST_BACKTRACE = "1"
RUST_LOG = "debug,simple-completion-language-server=trace"
LOG_FILE = "/tmp/completion.log"

3) Run helix with hx -vvv /tmp/doc.md and check log files for error entries

In tmp/completion.log must be something like

2024-08-16T07:32:33.841972Z DEBUG simple_completion_language_server: Citation word_prefix: bra, chars_prefix: too:[@bra
2024-08-16T07:32:33.842012Z DEBUG simple_completion_language_server: Citation try to read: /tmp/literatur-lukeflo.bib
2024-08-16T07:32:33.842088Z DEBUG simple_completion_language_server: Citation from file: /tmp/literatur-lukeflo.bib prefix: bra key: irfanullah_open_acces_global_south_2021 - match: false
2024-08-16T07:32:33.842093Z DEBUG simple_completion_language_server: Citation from file: /tmp/literatur-lukeflo.bib prefix: bra key: brainard_pay-to-publ_model_open_acces_2024 - match: true
2024-08-16T07:32:33.842107Z DEBUG simple_completion_language_server: Citation from file: /tmp/literatur-lukeflo.bib prefix: bra key: brembs_replacing_academic_journals_2023 - match: false
2024-08-16T07:32:33.842214Z DEBUG simple_completion_language_server: completion request took 0ms with 1 result items

4) Try to save doc file or reopen it to reset internal state of scls. May some bug in scls on processing incremental doc changes

lukeflo commented 3 months ago

Hey, thanks for the fast response.

My logging was already set up. I just run it with -vvv flag with a clean completion.log (removed the older file before opening Helix). There occurs an error when trying to insert a citation key, just after the first line from your example log:

2024-08-16T10:17:58.529684Z DEBUG simple_completion_language_server: Citation word_prefix: mbem, chars_prefix: @mbem
2024-08-16T10:17:58.529695Z  WARN simple_completion_language_server: Failed to repr slice as str
2024-08-16T10:17:58.529823Z DEBUG simple_completion_language_server: completion request took 0ms with 0 result items

The full log (tried two citations @mbem... and @bra...): completion.log

Inside the (much longer) helix.log I cant find an error message related to this use case. But I might have overlooked something, since because of my not existing Rust knowledge I do not know which kind of message I've to look for.

Here is the full log: helix.log

I've created an even simpler file:

---
title: A great test file
author: lukeflo
bibliography: "/home/lukeflo/Documents/notes-db/literatur-lukeflo.bib"
---

# Heading

Lorem ipsum odor amet, consectetuer adipiscing elit. Tristique hendrerit
faucibus elementum sapien euismod gravida hendrerit orci. Litora litora
sociosqu torquent dignissim tortor a. Curae porttitor penatibus lorem odio
nisi. Sapien aliquam varius curabitur imperdiet in tincidunt. Ac bibendum
aenean dis vivamus sem purus cras eget. Tortor fermentum quam sodales sit ut in
neque. Duis mauris varius habitant mollis sollicitudin gravida ullamcorper. Est
potenti nec facilisi posuere arcu velit dictum lobortis. Tortor efficitur morbi
vitae in orci nibh ullamcorper habitant ex. Porta penatibus morbi odio magnis
volutpat felis felis tristique. @mbembe

Nisl nibh amet nam nascetur auctor. Euismod blandit ultrices litora conubia hac
habitant egestas. Tortor ut pretium cubilia litora parturient hendrerit nibh
posuere. Vel nam sed mollis sit molestie congue magnis lorem. Ipsum elementum
eget efficitur accumsan dis scelerisque. Donec velit volutpat ultrices purus
condimentum suscipit. Morbi elementum est bibendum; aliquam phasellus netus
diam in. Tempus et scelerisque dignissim lacinia pulvinar nunc. Curabitur magna
curae arcu; donec nullam tempus. Placerat habitant commodo finibus vel ex.
Cubilia metus eget primis venenatis metus ante. Tincidunt rutrum ante; class
montes aliquet odio consequat vivamus. Fames condimentum vivamus conubia nisi
diam porta hendrerit. Lectus neque felis rhoncus commodo quis cursus phasellus
pharetra. Purus finibus duis fringilla faucibus quam phasellus curabitur.
[@bra]

It still only working with the short example from my post above. There can be no typo or so, since I copied the working short example and just enhanced it with the "Lorem Ipsum" stuff and some additional yaml arguments

estin commented 3 months ago

@lukeflo you're right! File size cause on Rope logic (internal scls text buffer). Reproduced and fixed.

lukeflo commented 3 months ago

@estin thanks for your once again fast reply. Just built the branch and now it works better, but unfortunately there are still some drawbacks.

First the good news: The longer "Lorem ipsum" example from my last comment now works... most of the time. But there seems to be a problem if the entered characteres can match a citekey, as well as a simple text completion from one words already typed in the buffer. For example, my bibfile contains the following key, grandsire_the_metafonttutorial_2004, but the lorem ipsum text also contains the word gravida. Thus, when I type @gra, it only suggests the in buffer text word gravida, but not the key.

As long as I only type @gr, it matches:

2024-08-16T18:58:28.253348Z DEBUG simple_completion_language_server: Citation from file: test.bib prefix: gr key: grandsire_the_metafonttutorial_2004 - match: true

But when I add the a, only the in-buffer completion is shown as candidate. And the log shows no entry for @gra or prefix: gra, as it did for @gr: completion-lorem.log

Now, when I try it with an even bigger file, its still not working at all. At the moment, for example, I'm writing a scientific paper regarding my current research. The text already runs multiple A4 pages. If I open the respective Markdown file, which also contains a yaml header with more than 20 lines, and try to trigger the citekey completion somewhere in a paragraph, nothing happens.

The log even does not show true/false matches as in the case of the lorem example:

2024-08-16T19:02:54.841446Z DEBUG simple_completion_language_server: Citation word_prefix: gr, chars_prefix: @gr
2024-08-16T19:02:54.842162Z DEBUG simple_completion_language_server: Citation try to read: papersiz
2024-08-16T19:02:54.842182Z ERROR simple_completion_language_server: Failed to read file papersiz: No such file or directory (os error 2)
2024-08-16T19:02:54.843571Z DEBUG simple_completion_language_server: completion request took 2ms with 2 result items

Full file here: completion-paper.log

Sorry for bothering you with this stuff. Would be totally ok, if you've other things to do :wink:

estin commented 3 months ago

@lukeflo please try new update - citation completion will not mixed with words completion

on yours bigger file found in logs - papersiz file not found

Failed to read file papersiz: No such file or directory (os error 2)
completion request took 2ms with 2 result items

Please send value of be bibliography: line in yours bigger file. may regex to extract file path is invalid

lukeflo commented 3 months ago

I'll have a look asap. But probably not before tomorrow...

lukeflo commented 3 months ago

Hey, finally had time to test it. Sorry for the waiting time.

The first issue is solved. As you say, citation completion is not interacting with word completion anymore. Great!

The papersize part from the log file regarding my article corresponds to the bigger yaml header of the file. The lines surrounding the bibliography: keyword are the following:

---
# some more lines
header-includes: |
  \setlist{nosep}
  \usepackage{blindtext}
  \DeclareFieldFormat[online]{shorthand}{\texttt{#1}}
  \newcommand{\origunderscore}{}
  \let\origunderscore\_
  \renewcommand{\_}{\allowbreak\origunderscore}
  \usepackage[htt]{hyphenat}\usepackage{emptypage}
  \setcounter{secnumdepth}{0}
bibliography: "/home/lukeflo/Documents/notes-db/literature-lukeflo.bib"
papersize: a4
urlcolor: articlecolour
---

The syntax is fine, as is the file path of the bibliography. That is confirmed because I can process the document using pandoc without getting any error messages.

estin commented 2 months ago

And again. Please try new update.

Bug was on extract path by captured span.

lukeflo commented 2 months ago

It works! As far as I can see, it now works in all circumstances I've tested. GREAT, thank you!

estin commented 2 months ago

nice! on next week I will merge this feature to the master branch and make some changes on tests.

How are your deal with spelling on helix? Which tools are you use?

lukeflo commented 2 months ago

One minor thing which could be enhanced is the preview of the selected entry. Right now it is not very unified. Sometimes the highlighting changes and sometimes it does not show the whole entry, especially with longer entries:

swappy-20240823-125452

swappy-20240823-125646

Its not a big thing, as I personally know most of my bibliographic entries good enough to identify them only by citekey. But if someone is using another scheme or is not as familiar with the database, he/she might have problems recognizing which entry is selected.

The best use case would be to extract the value of the author/editor, the title, and the year/date field and only preview those values, for example, with the following format:

An author, Some kind of title, 2020

But thats fully optional, since it already works very good!

lukeflo commented 2 months ago

How are your deal with spelling on helix? Which tools are you use?

What do you mean exactly. Spell checking grammar or code?

estin commented 2 months ago

The best use case would be to extract the value of the author/editor, the title, and the year/date field and only preview those values, for example, with the following format:

it's ease to be done

What do you mean exactly. Spell checking grammar or code?

grammar on text docs and grammar on code docs such as string literals and comments.

I'm currently use typos for coding, but want robust spell-check on "notes taking"

lukeflo commented 2 months ago

Nothing specific right now, since I'm still setting up Helix to work properly regarding my needs. I just switched recently from Emacs.

For prose there is vale, but I haven't tested it so far. typos also looks good. Have to try both

lukeflo commented 2 months ago

Just tried ltex-lsp. It works really good and detects many typing and spelling errors in prose text; at least in German.

Not much is needed, only a simple setup in languages.toml:

#ltex-ls
[language-server.ltex]
command = "/home/lukeflo/Documents/packages/ltex-ls-16.0.0/bin/ltex-ls"

[[language]]
name = "markdown"
roots = [".marksman.toml"]
language-servers = [ "marksman", "markdown-oxide", "scls", "ltex" ]

After setting it up, you can correct wrong spelling under the cursor with space a.

estin commented 2 months ago

@lukeflo

 cargo install --features citation --git https://github.com/estin/simple-completion-language-server.git
lukeflo commented 2 months ago

@estin Sorry for the delay, I was busy the last days.

Just built the updated main branch with your --features flag. Works great! And looks great! I'm already using your LSP on a daily basis and think others will appreciate this feature too, since no other Helix plugin/extension handles Markdown citations yet.

lukeflo commented 2 months ago

I guess this is done and can be closed!