girzel / ebdb

An EIEIO port of BBDB, Emacs' contact-management package
67 stars 11 forks source link

show progress while reading ebdb sources #53

Closed tromey closed 6 years ago

tromey commented 6 years ago

I have a fairly large ebdb file (15M). Loading it takes a noticeable amount of time.

If it's possible to either speed this up, or to do the reading in the background (process filter or thread), that would be great.

Otherwise, printing completion percentage in the echo area would be helpful.

girzel commented 6 years ago

Good lord that's huge, how many records do you have?

Load speeds are definitely something I've meant to look at eventually, but it was fairly far down the road. But with that many records... I'll try to do some profiling. It's all in-Emacs work, unfortunately, so there's no way to offload to a thread or process. A counter wouldn't be hard, though obviously doesn't really fix the problem.

tromey commented 6 years ago
bapiya. grep record-person ~/.emacs.d/ebdb|wc -l
24486

It would be possible to spawn a thread to do the work, maybe. It could thread-yield after every N records or something like that. Of course, this isn't any better if the foreground task is just waiting for the results, but I could just have some "spawn a thread to load ebdb" in my .emacs.

girzel commented 6 years ago

Threading's really only going to provide a benefit if it can happen while Emacs is doing other thread-yielding work, though I guess I could just create the thread, call thread-yield periodically, and let the user try to make use of that. I suppose it would be worth looking at Gnus' startup hooks (you might not use Gnus), and see if there's a logical spot where we could yield to EBDB loading while Gnus is waiting on process output from its servers.

The problem is that the load process is split into two parts: object creation, and initialization. The object creation stage is mostly a black box to me: all the objects are constructed within a single call to eieio-persistent-read (well, one call per database). There might be some tuning that could be done around that, but for the most part I can't touch how it works. Unfortunately, I can't call thread-yield once per N record loads.

Initialization is entirely under EBDB control, and there might be some speedups that can be found there.

I'll do a bit of profiling. Just for my information, can you tell me how long ebdb-load takes in your case, and how much of that is loading vs initialization? Even just ballpark?

girzel commented 6 years ago

And just out of curiosity, how are search times? I'm fairly satisfied with the general performance of search, but my databases are about a tenth the size of yours, and I'd be curious if there are notable lags.

girzel commented 6 years ago

Actually I guess what I said before isn't really true: I could override object-write for EBDB databases, and probably write a database file that was a bit quicker to read.

Otherwise it looks like the only knobs to turn are eieio-backward-compatibility and eieio-skip-typecheck. Experiments show that that turning off typechecks anyway speeds up load times, I'll continue fooling with this.

tromey commented 6 years ago

Just for my information, can you tell me how long ebdb-load takes in your case, and how much of that is loading vs initialization? Even just ballpark?

I timed it today, in a primitive way:

(progn (insert (current-time-string) "\n") (ebdb-load)  (insert (current-time-string) "\n"))
Sun Sep 10 12:31:46 2017
Sun Sep 10 12:32:50 2017

So about 1 minute.

tromey commented 6 years ago

And just out of curiosity, how are search times?

I haven't really used it much that way. Reading email in gnus and having the info pop up seems fast enough, so no complaints at least :)

girzel commented 6 years ago

Reading email in gnus and having the info pop up seems fast enough

That's a hashtable lookup, so I'd expect that to be pretty quick regardless.

So about 1 minute.

Gad, that's awful. Last question, then I'll leave you alone: can you eyeball how much of that time is loading, and how much is initialization? (The messages are different.) I don't need anything super exact, just to know if more time is used in one stage or another, so I know where to focus work.

Also, next time I update the package there will be a new option, ebdb-try-speedups, that should help a little bit. It's nil by default; you could set it to t now, and it will take effect when the package gets updated.

girzel commented 6 years ago

I'm going to close this for now -- any further solution is going to be quite a different venture.

tromey commented 6 years ago

Sorry, I never got around to eyeballing the startup. If you'd like I could try profiling it in Emacs maybe? I don't know whether it's easy to upload profile data but maybe it would show something interesting.

One thing I have noticed is that gnus stores its registry as an EIEIO file. This is about 1/3 the size of my ebdb -- but it loads much faster, at least subjectively more than 3x as fast. So I wonder if it is doing something different.

bapiya. ls -lh .gnus.registry.eieio .emacs.d/ebdb 
-rw-r--r--. 1 tromey tromey  15M Sep  9 12:09 .emacs.d/ebdb
-rw-r--r--. 1 tromey tromey 5.7M Sep 15 23:13 .gnus.registry.eieio
girzel commented 6 years ago

Thanks, and no worries, this has already been very helpful. I don't think it's worth going to the trouble of profiling -- I know where most of the slow code is, and it's not anywhere I can reach.

It would still be nice to know the general ratio of time spent in loading to initialization, though. You could do something like:

(let ((eieio-skip-typecheck nil))
  (ebdb-clear-vars)
  (insert (message "starting load: %s" (current-time-string)))
  (eieio-persistent-read "~/path/to/ebdb.dat" 'ebdb-db t)
  (insert (message "starting init: %s" (current-time-string)))
  (ebdb-initialize)
  (insert (message "finished init: %s" (current-time-string))))

Do that once, then switch eieio-skip-typecheck to t, and do it again. That would be very useful.

One thing I have noticed is that gnus stores its registry as an EIEIO file. This is about 1/3 the size of my ebdb -- but it loads much faster, at least subjectively more than 3x as fast. So I wonder if it is doing something different.

The registry is a single object, and 99.9% of the file is data used to construct a single hash table. Your ebdb file holds tens of thousands of objects, and as each one is constructed there's a lengthy process of slot validation and type checking. Setting eieio-skip-typecheck to t will help, but I don't think there are any other knobs that can be used to speed up object construction.

tromey commented 6 years ago

First time:

starting load: Sat Sep 16 11:24:50 2017
starting init: Sat Sep 16 11:25:59 2017
finished init: Sat Sep 16 11:26:05 2017

With skip-typecheck t:

starting load: Sat Sep 16 11:27:31 2017
starting init: Sat Sep 16 11:28:31 2017
finished init: Sat Sep 16 11:28:33 2017
tromey commented 6 years ago

I don't have the version with skip-typecheck, let me do that again.

tromey commented 6 years ago
starting load: Sat Sep 16 11:36:05 2017
starting init: Sat Sep 16 11:37:10 2017
finished init: Sat Sep 16 11:37:14 2017
girzel commented 6 years ago

Well that didn't do anything at all! That's sort of surprising, and disappointing. Anyway, it's still worth leaving ebdb-try-speedups as t -- if I find any more tricks in the future, I'll guard them behind that option.

tromey commented 6 years ago

Hmmm, I have ebdb-try-speedups set in my .emacs. So maybe the first test was invalid.

girzel commented 6 years ago

No, I don't think I've bumped the version number since that option was added -- it shouldn't actually be in your installed EBDB at all. To be sure, I guess you could add it to the let statement and set it to nil...

tromey commented 6 years ago

Aha, yes -- the first try I was still using the one from ELPA. Thanks.