apcshields / autocomplete-bibtex

Adds Pandoc-style BibTeX citation key autocompletion to autocomplete+ for Atom.
MIT License
44 stars 17 forks source link

Atom uses a lot of processor load (and is unusable) when (large) bibtex-file is reloaded #96

Closed mbroedl closed 6 years ago

mbroedl commented 6 years ago

I use a large (1.6MB) file exported from zotero with autocomplete-bibtex. Runs great, especially since the performance improvements. :)

At the moment, I use the zotero-better-bibtex extension (since I use autocomplete-bibtex luckily not anymore to manage citation keys) to auto-export the file. It seems like it overwrites the .bib file in place. While it's great that autocomplete-bibtex notices change, electron takes up a whole processor's capacity whenever the file changes, and does not stop anymore even after minutes, whereas a startup with a new file is not a problem at all.

Further details:

Concluding, I assume the problem is that the file is written as an output stream. I could post a bug report on better-bibtex as well, ask them to not write as a stream in-place.

Possible solutions:

Probably there is more, not sure.

mbroedl commented 6 years ago

Just spoke to my friend who uses autocomplete-bibtex with Mendeley auto-export, and she experiences the same problem.

Thought about the problem more: probably since the file is likely to be written as an output stream, the file-change event listener fires many times (because the file changes continuously) instead of just once when the file is fully written (and is the case when saving it from a text editor).

Suggested solution: Maybe one could buffer the file-change events and when the last file-change event is maybe two seconds ago (could handle that via a promise design, maybe?), only then reload the bibtex file?

mangecoeur commented 6 years ago

@mbroedl what OS are you on? This was built and test on mac where it seemed to work fine but I noticed issues on windows.

mbroedl commented 6 years ago

Myself I'm on Linux, my friend is on Windows.

mangecoeur commented 6 years ago

@mbroedl I've updated the file loading the be asynchronous, which should solve this. Would be great if you could test the development version before I push this out to everyone! Note that if you have a really huge bibtex file, it will take some processing to parse it, but this should now at least not lock up the rest of the editor

mbroedl commented 6 years ago

Mhmmm... so when I make one change only, it's alright with this version. It still seems to block atom, but at least it goes back to normal (with my bibtex file) after a few seconds.

However, sometimes it doesn't and I tried to investigate why, so I added to provider.js -> update()

console.log(+ Date.now(), 'reading changed bibtex')

Outcome: In my case, it calls the function update twice on startup, and twice for every change made in zotero (which is probably an issue with the zotero exporter as well?). However, when I continue doing that, there is more and more re-reads of the bibtex file than changes I have made, and some still occur ages later.

So over the course of a minute or two of working in zotero (moving things, retagging, importing new files, etc), the update-function is called quite a lot of times (would assume like 100+ times) which even asynchronously blocks atom. Plus: this error came a few times in between, probably when the async reading is so delayed that the file is written again?

lite-bibtex-parse.js [sm]:298 Uncaught (in promise) TypeError: Unterminated value: value_braces at BibtexParser.value_braces (lite-bibtex-parse.js [sm]:278) at BibtexParser.single_value (lite-bibtex-parse.js [sm]:315) at BibtexParser.value (lite-bibtex-parse.js [sm]:331) at lite-bibtex-parse.js [sm]:371 at BibtexParser.key_equals_value (lite-bibtex-parse.js [sm]:385) at BibtexParser.key_value_list (lite-bibtex-parse.js [sm]:400) at BibtexParser.entry_body (lite-bibtex-parse.js [sm]:408) at BibtexParser.entry (lite-bibtex-parse.js [sm]:429) at BibtexParser.parse (lite-bibtex-parse.js [sm]:470) at Object.toJSON (lite-bibtex-parse.js [sm]:478) at BibtexReader. (bibtex-reader.js [sm]:5) at Generator.next () at step (bibtex-reader.js [sm]:1) value_braces @ lite-bibtex-parse.js [sm]:298 single_value @ lite-bibtex-parse.js [sm]:334 value @ lite-bibtex-parse.js [sm]:348 (anonymous) @ lite-bibtex-parse.js [sm]:387 key_equals_value @ lite-bibtex-parse.js [sm]:404 key_value_list @ lite-bibtex-parse.js [sm]:421 entry_body @ lite-bibtex-parse.js [sm]:435 entry @ lite-bibtex-parse.js [sm]:465 parse @ lite-bibtex-parse.js [sm]:511 toJSON @ lite-bibtex-parse.js [sm]:522 (anonymous) @ bibtex-reader.js [sm]:15 step @ bibtex-reader.js [sm]:1

Is there a way to trigger reloads only when the window is in focus? Because I won't need an updated file in atom, when still working in zotero, for example. And if something in zotero happens asynchronously (e.g. adding a paper) while the focus is on atom, it should not involve too many changes to the database and thus updates.

Sorry for this... :/

mbroedl commented 6 years ago

I will leave a bug report with the zotero extension to build in a buffer, as well!

mbroedl commented 6 years ago

The autoexport extension for zotero has a setting to only export when idle. That should fix my problem for now!

mangecoeur commented 6 years ago

@mbroedl thanks so much for digging into this! For sure the error you are seeing is due to the system trying to parse a partially written file. I think you are right that it's caused by reading being delayed so much that it starts when the file is getting written again. What is ideally needed is a way to interrupt parsing if a new change occurs and start again, I hope to find a way for this. On the other hand I'm not sure if there is a way to delay until the window regains focus. Glad you have a workaround for now.

mbroedl commented 6 years ago

I've just experimented a bit. Since the events seemed to be bundled in the course of 1 or 2 seconds, you could run timeouts to reread the reference files (or maybe even to call update).

One could do the following to re-read 1000 ms after the last time update() was called:

clearTimeout(timeout);
timeout = setTimeout(() => {
  // update, readReference, or whatever
  }, 1000);

Otherwise, maybe in lite-bibtex-parse.js one could add a cancellation variable, so that while (this.matchAt()) { in the parse() function becomes

while (this.matchAt() && timestampWhenParserCalled == lastCalledTimestamp) {

where lastCalledTimestamp could be a global(?) variable?

Still, it might be good to connect this to the first option (maybe with 200ms or so), to avoid double-reading the file (which is done by a file-handler so probably a bit more difficult to stop).

Otherwise, re-reading on focus would be something like (just making this up now, none of the functions is correct):

event.onFileChange => {
   if window.hasFocus() { update(); }
   else {
      if noEventOnRefocus {
          register EventOnRefocus => {
              destroy EventOnRefocus;
              update();
           }
      }
   }
}
mangecoeur commented 6 years ago

I would really like to be able to interrupt the parsing, and probably a global variable is the way to go, although I'm also considering how i could offload the work to a child process or Web Worker (which would be neat in that it would use parallel processing) - especially since I think the parsing is what is taking time and locking up the system, even a 'big' bibtex file is only a few MB which should be very fast to read.

mangecoeur commented 6 years ago

@mbroedl I have tried offloading the bibtex processing to a web-worked, would be great if you could test it! It should now not lock up the editor when processing. However, i didn't yet implement interrupting the updates when new changes happen because i need to decide what the best way to structure everything is.

mbroedl commented 6 years ago

Good news: it did not block atom (although it was slow). Atom's electron process took up more than one processor now and all seemed super busy. (EDIT: it was slow when doing a lot of stuff in zotero and then changing to atom; when doing one change, I couldn't find a performance change, and it did not block at all.)

But it threw a lot of error messages when just performing a lot of changes:

many times, once already on startup:

/usr/lib/atom/src/state-store.js:49 Uncaught (in promise) DOMException: Failed to execute 'put' on 'IDBObjectStore': #<Promise> could not be cloned.
    at dbPromise.then (/usr/lib/atom/src/state-store.js:49:12)
    at <anonymous>

many times:

bibtex-reader.js? [sm]:37 Uncaught TypeError: Cannot read property 'terminate' of null
    at Worker._this.bibReaderWorker.onmessage (bibtex-reader.js? [sm]:23)
_this.bibReaderWorker.onmessage @ bibtex-reader.js? [sm]:37

once, probably because too many updates on the file happened:

(node:17586) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 21 ipc-helpers-window-method-response listeners added. Use emitter.setMaxListeners() to increase limit
process.on @ internal/process/warning.js:21
emitOne @ events.js:96
emit @ events.js:191
process.nextTick @ internal/process/warning.js:47
_combinedTickCallback @ internal/process/next_tick.js:73
_tickCallback @ internal/process/next_tick.js:104
/usr/lib/atom/src/state-store.js:49 Uncaught (in promise) DOMException: Failed to execute 'put' on 'IDBObjectStore': #<Promise> could not be cloned.
    at dbPromise.then (/usr/lib/atom/src/state-store.js:49:12)
    at <anonymous>
dbPromise.then @ /usr/lib/atom/src/state-store.js:49

Without doing any changes, error one comes maybe every 2 minutes. When only doing one change, so possibly 2 calls to update as before? There were maybe 5-10 more of these. The update of the database worked nonetheless.

Atom 1.23.3 Electron 1.7.11 Node 9.4.0

mangecoeur commented 6 years ago

@mbroedl I've made some more improvements, most importantly to limit the number of times the update function is called when the file changes to no more than once every 2 seconds or so, and to check that the file has finished writing before reading. Let me know if this improves the situation for you.

mbroedl commented 6 years ago

Works wonderfully. Processor load never went up noticeably, and electron did not block!

The only peculiarity (but I don't think it's a reasonable problem) is when the dialog for autocompletion is open, and atom has reloaded the file, the dialog is not reloaded (i.e. the old file is used). The new/changed entries only show up when typing/deleting one character (as in: autocomplete is called again). But that's an unlikely case anyway.

Thanks a lot!

mbroedl commented 6 years ago

One thing: Atom notified me that I needed a module fuse.js when I first started it with that module. But I guess that would be taken care of by apm in the production version?

mangecoeur commented 6 years ago

Great! I think the issue you mentioned are ok. fuse.js should be installed in the production version automatically, will close the issue then.