apcshields / autocomplete-bibtex

Adds Pandoc-style BibTeX citation key autocompletion to autocomplete+ for Atom.
MIT License
44 stars 17 forks source link

Very high startup time on large bibtex files! #85

Closed mbroedl closed 6 years ago

mbroedl commented 7 years ago

I have a considerably large bibtex-file which I use for my work. As I always have my markdown documents open when start atom, the package is both, loaded and activated, and the activation takes extremely long.

bibtex: 800kB
loading: 181ms
activating: 1115ms

when I make my bibtex significantly smaller (by deleting most of my references, I already optimised notes), activation time is virtually nonexistent:

bibtex: 100kB
loading: 185ms
activating: < 15ms

Is there any way to increase the startup time? Maybe to load the file asynchronously or with a delay? Or to read only those lines that are used by the package?

Many thanks!

mangecoeur commented 7 years ago

In theory, the bibtex file is already cached to reduce start-up time, but that only works when the bibtex does not change. The problem is that parsing bibtex is simply quite slow. It wouldn't be a bad idea to try to do it async, not sure how easy this is to do with the current bibtex parser module.

mbroedl commented 7 years ago

Surprisingly, in my case the loading of a changed bibtex file seems to be a fraction faster than loading the cached version.

Possible fixes:

  1. add a pause/wait before loading the file
  2. loading when starting to type in an editor that is in one of the defined scopes (as opposed to only opening it)
  3. loading when the @-key is hit for the first time

I'd probably go for version 2 unless it blocks all other operations. Maybe it could be an optional configuration?

mangecoeur commented 7 years ago

From memory, i think the activation was not delayed because normally the way to delay activation is to add the supported scopes to the package.json. But this makes it impossible to activate on custom scopes defined in the options :/ Fixing this means adding more machinery to listen for file open events. Probably better to first try to make parsing faster. One thing is certain, it should be faster to open the cached data than to re-parse it, that's a bug.

mbroedl commented 7 years ago

Fair point.

Just had a look at the code and did not realise that the provider needs to be set up in advance with a wordlist. Only other idea would then be to cache the wordlist separately, and only build the full index from the file/cache when first needed?

Regarding the parser, I was wondering whether it may be because the zotero-parser may parse everything because it may need everything? Whereas this package only needs authors, editors, title, year, bibtex-key, and maybe the journal/book? Not sure if it's possible ignoring these lines in the parser, but maybe these fields are stored away as well and could be removed?

mangecoeur commented 6 years ago

I've implemented you suggestion in the master branch version. I skip adding some bibtex elements we don't need like comments. The main thing that seemed to be slowing things down was converting latex accents/non ascii chars into unicode - now I only do that for authors and title and it seems much faster (I think if your bibtex included abstracts it was making it particularly slow).

If you can test this out that would be fab!

mangecoeur commented 6 years ago

@mbroedl I made a few more changes to avoid doing a lot of unnecessary work (looping through the whole collections several times and storing several copies of it). Please test the latest development version and let me know if it seems noticbly faster.

mbroedl commented 6 years ago

Oooh, very nice, @mangecoeur , thanks a lot for your efforts! My bibtex included abstracts, indeed.

Loading seems now down to less than a quarter of the original time, so the same file as above (800kb) is now down to ~250ms loading time, and including abstracts or even lengthy notes into the file does not change loading time (i.e. they seem efficiently skipped). Loading time does not differ much between

Thanks so much, it's looking great. :)

mangecoeur commented 6 years ago

great, i'll mark this solved for now then.

werunom commented 6 years ago

Thanks for solving this. I had uninstalled this package for the same reason; now I can start using it again :)

A general clarification about installing the master-branch: when I install the package from atom using the install option in settings, I get the stable-version (which is quite old) or would it install the latest master version where all these fixes are present? If this doesnt install, then how to install the master?

mbroedl commented 6 years ago

@werunom To install the latest master branch, you need to load the development package and then start atom in developer mode; looking at my startup time, the recent changes are not yet in the official release.

apm develop autocomplete-bibtex atom -d

werunom commented 6 years ago

@mbroedl - thanks for pointing out. Let me try it out...

mangecoeur commented 6 years ago

Please let me know how it goes. I will will make a new stable release soon but because I changed so much stuff i want to wait till a few people have used it to make sure it doesn't explode

On 17 Nov 2017 2:05 pm, "werunom" notifications@github.com wrote:

@mbroedl https://github.com/mbroedl - thanks for pointing out. Let me try it out...

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/apcshields/autocomplete-bibtex/issues/85#issuecomment-345238735, or mute the thread https://github.com/notifications/unsubscribe-auth/AAtYVHjfwLEoqQbL91h_GpYh2PhKllIBks5s3YR_gaJpZM4QCnLI .

mangecoeur commented 6 years ago

@werunom p.s. note that if you install using mbroedl's instructions, the development version will only be used when you open atom in dev more (atom -d) - this is usually a good thing, so that the normal stable version keep getting updated in the normal way in non-dev mode atom