emacs-citar / citar

Emacs package to quickly find and act on bibliographic references, and edit org, markdown, and latex academic documents.
GNU General Public License v3.0
508 stars 54 forks source link

How to manage bib files for performance #653

Open bdarcus opened 2 years ago

bdarcus commented 2 years ago

With the new cache, seems like a script that optimizes performance would be a good addition to the wiki (feel free to edit it)?

The simplest version would maybe be, though I'm sure I'm missing something (like how it's run?)}:

#!/bin/bash

BIG=library.bib
SMALL=new.bib

# size of small before script actually does anything
# what's a good default here?
MAXSIZE=5000

# get file size
FILESIZE=$(stat -c%s "$SMALL")

if ((FILESIZE > MAXSIZE)); then
    # when $SMALL exceeds $MAXSIZE, move its content to $BIG
    cat "$SMALL" >> "$BIG"
    echo "" > "$SMALL"
fi

Note. this approach wouldn't work for CSL-JSON, given that syntax.

Or perhaps we could modify citar-bibliography, or add a new defcustom, to accommodate this idea directly, since the cache already includes filesize. Something like:

(setq citar-bibliography `(:big "library.bib" :small '("new.bib") :maxsize 5000))

... or maybe there's a way to keep it forward-compatible, at least for a bit?

Could also be the defcustom remains as a list, and a function just figures out what's big and small?

Originally posted by @bdarcus in https://github.com/emacs-citar/citar/pull/651#issuecomment-1179423503

mclearc commented 2 years ago

On the latest commit (really I think since the change to the new cache system) I've noticed that citar takes 25-30 seconds to load on first being called (and same for any subsequent change in the bibfile). I have a somewhat large .bib file (~7200) but it did not take this long to load it previously. Is there a way to avoid this without splitting my bib file into multiple smaller ones (something I am not intending to do)? Thanks!

bdarcus commented 2 years ago

Roughly how long did it take before the merge?

cc @roshanshariff

mclearc commented 2 years ago

I would say about 8-10 seconds. So it seems like it is currently at least roughly twice as long as previously.

bdarcus commented 2 years ago

I also notice first load seems slower with my much smaller file, but I'm not sure why that should be the case (I don't think it should be).

Seems like there may be something that needs optimizing.

mclearc commented 2 years ago

I might also add, though this may well be a separate issue, that when I was using citar-capf, citar would reload the bib file on basically every word I typed. This of course made typing very laggy, so much so that I switched to using capf-bibtex, which doesn't rely on citar. If you like I can look into this a bit more and open a separate issue.

bdarcus commented 2 years ago

I created a new issue for this with your first post, since this is really about optimizing a specific scenario, while your report points to a general issue.

bdarcus commented 2 years ago

If you like I can look into this a bit more and open a separate issue.

Yes please.

bdarcus commented 2 years ago

I think this:

https://github.com/emacs-citar/citar/blob/c05dce095f340b1ab91434e17eac06b40911fd2a/citar-capf.el#L30

Should be:

(defvar citar-capf--candidates
  (or (citar--format-candidates)
      (user-error "No bibliography set"))
  "Completion candidates for `citar-capf'.")

... but I still can't test it myself.

bdarcus commented 2 years ago

On the capf, can you try this @mclearc?

https://github.com/emacs-citar/citar/commit/c5ed78e69ab1ddbf19f6b5793c8356678466caf8

roshanshariff commented 2 years ago

I think citar-capf is going to need some more extensive changes. Currently it just saves the active bibliographies whenever citar-capf is loaded and then never updates them. But we also can't update it every time capf is called. We'll need to do something a little more involved so it's reasonably fast but also updates when the bibliiographies change (and respects the buffer's local bibliographies).

bdarcus commented 2 years ago

I added an issue for that @roshanshariff .

bdarcus commented 1 year ago

I think this should have been resolved awhile ago.