abingham / prosjekt

A emacs extension for working with "projects"
22 stars 7 forks source link

speed up population #18

Open abingham opened 12 years ago

abingham commented 12 years ago

The populate routines are apparently slow, too. See about speeding them up.

abingham commented 12 years ago

This root of this problem seems to be in the fact that we do a linear lookup in the project-file list for each match we find. That is, for each file that matches the population pattern, we make sure it's not already in the file list. We need to use a sorted list for this so that we can get the time down to logarithmic.

This may not be the entire problem, but I saw dramatic improvements in speed on a test project when I jimmied together a fix.

abingham commented 12 years ago

@kingcheez I'm just tagging you so that you'll see any progress on this front.

sohailsomani commented 12 years ago

On 25/09/2012 6:06 AM, abingham wrote:

This root of this problem seems to be in the fact that we do a linear lookup in the project-file list for each match we find. That is, for each file that matches the population pattern, we make sure it's not already in the file list. We need to use a sorted list for this so that we can get the time down to logarithmic.

This may not be the entire problem, but I saw dramatic improvements in speed on a test project when I jimmied together a fix.

— Reply to this email directly or view it on GitHub https://github.com/abingham/prosjekt/issues/18#issuecomment-8849237.

Hey, you might want to look at using a hash table (and thanks for the tag).

http://www.gnu.org/software/emacs/manual/html_node/elisp/Hash-Tables.html#Hash-Tables

abingham commented 12 years ago

Right, that looks like what I need. Hopefully I'll be able to put something faster in place in a few days. Thanks for the tip.

abingham commented 12 years ago

This should be fixed now. I updates the project format to store files in a hash-table rather than a list. On a synthetic project of 10k files it was much faster. I've tested this a fair amount, and it the changes seem good. There's even code to upgrade existing, old-style projects on load.

sohailsomani commented 12 years ago

I just tried it and it doesn't seem any faster to me on the 2K file project. I can see that you are using a hashtable when I run prosjekt-setup so I'm not sure why. Is ther ea way to profile elisp code?

sohailsomani commented 12 years ago

Here is the output of elp (http://www.python.org/emacs/elp.el):

prosjekt-walk-path                    10770       66.941999999  0.0062155988
prosjekt-repopulate                   1           18.656        18.656
prosjekt-populate                     5           18.064        3.6128
prosjekt-save                         6           3.392         0.5653333333
prosjekt-write-object-to-file         6           3.392         0.5653333333
prosjekt-clear                        1           0.592         0.592
prosjekt-add-if                       58240       0.2690000000  4.61...e-006
prosjekt-insert-file                  1856        0.0340000000  1.83...e-005
prosjekt-redo-tags-on-file-save-hook  6           0.006         0.001
prosjekt-get-project-item             1857        0.003         1.61...e-006
prosjekt-set-project-item             1           0.0           0.0

So the thing that jumps out to me is why prosjekt-save is called so many times when it should really only be called once. That looks as if it would shave off a lot of time from the run. Maybe you can have a dynamic variable that you can set when doing prosjekt-repopulate:

(let (*don't-save-project* t)
  (prosjekt-repopulate...))
(save-project)

functions ar eimaginary obviously.

You can get these results by doing load-library elp, M-x elp-set-master prosjekt-repopulate, M-x elp-results, M-x elp-unset-master

abingham commented 12 years ago

You're right that prosjekt-save is called more than needed, I think we can address that fairly straightforwardly. But is it really the issue here? If I'm reading the table correctly, we're spending 67 sec. in prosjekt-walk-path alone, i.e. not counting functions it calls. We're only spending 3.4 sec. in prosjekt-save, so we'd still have pretty awful performance even if prosjekt-save took no time at all.

I'm seeing results similar to yours when I use elp. At least, I'm seeing prosjekt-walk-path dominate the reported run time. Strangely, the data from elp doesn't seem to match reality. The results I see say that 15 sec. is spent in prosjekt-walk-path, yet the entire repopulation only takes on the order of 5 sec. Still, even if I assume that the reported times are correct in a relative sense, I think I need to focus on prosjekt-walk-path.

abingham commented 12 years ago

A bit more data. If I modify prosjekt-populate to not save, then I get the number of prosjekt-save calls down to 1. This doesn't noticeably change the repopulation time for me (or the profile information.)

@kingcheez can you try something? Change the prosjekt-populate definition from "defun-autosave" to just "defun" and run a repopulation. Does that significantly change anything for you?

sohailsomani commented 12 years ago

Nope, no difference.

As for prosjekt-walk-path, yeah, my hash table has 2463 files so I don't know why it is being called so many times. That might be the culprit!

Edit: It's probably called so often because of svn files. So if we aren't ignoring svn files until we get to the filter, that might explain why it would be called 5,000 times but not 10,000 times

sohailsomani commented 12 years ago

Aha, I was wrong about the size of the project:

$ find . -type f | wc -l
9495

So 10,000 is correct because about half of those will be svn directories (some build directories too).

So svn is the problem :-( Any way to prevent descending into svn directories?

abingham commented 12 years ago

I'm going to have to do some tinkering on this. There's nothing in the prosjekt-walk-path code that is obvious for improvement, so I'm open to ideas. That code was actually lifted almost verbatim from the emacs wiki, so I assume it's not wildly incorrect. There may be C-based traversal implementations we can use, but I'd rather keep things in elisp if possible.

abingham commented 12 years ago

As for skipping svn directories, yes, that's already on the radar. See issue #11 (which will include svn when it's done.)

sohailsomani commented 12 years ago

Any chance you could tell me how to hack it in so I could see if it helps? I add files regularly to the project and having prosjekt-repopulate run fast would be SOOO nice :)

abingham commented 12 years ago

This is untested, but I think it'll work. On line 509 try changing:

(cond ((member file '("." "..")))

to:

(cond ((member file '("." ".." ".svn")))

This original code is there to avoid recursing on the current and parent directories. This just adds .svn to the list of "directories to skip."

sohailsomani commented 12 years ago

My results after doing the above:

prosjekt-repopulate                   1           8.949         8.949
prosjekt-save                         6           3.239         0.5398333333
prosjekt-write-object-to-file         6           3.239         0.5398333333
prosjekt-clear                        1           0.536         0.536
prosjekt-add-if                       28660       0.1960000000  6.83...e-006
prosjekt-insert-file                  1856        0.0600000000  3.23...e-005
prosjekt-redo-tags-on-file-save-hook  6           0.006         0.001
prosjekt-get-project-item             1857        0.002         1.07...e-006
prosjekt-set-project-item             1           0.0           0.0

Edit: Updated results with correct output. Just noticed that walk-path isn't even tehre anymore. I think I broke it!

It isn't THAT much faster, btw.

abingham commented 12 years ago

Well, I'm stumped. I've tried a number of ways to speed things up, but I can't make prosjekt-walk-path any faster. It's frustrating because system commands like "find" can do the traversal nearly instantly. I guess either a limitation of the emacs lisp file API, or I'm just missing something.

I'll keep this in the back of my head, but for now I don't have any great ideas. We could explore things like running system commands and parsing the results, but that quickly leads to platform issues. We might be able to make this appear faster by running it asynchronously, but that's a can of worms I'd sooner avoid.

Any ideas on how to speed up this function would be greatly appreciated!

sohailsomani commented 12 years ago

no ideas besides forking an emacs process :(

sohailsomani commented 12 years ago

Update: I used elp-instrument-package and got the following results. So the times are definitely better when adding .svn to the directories to ignore. Still takes a long time though.

prosjekt-walk-path                    2170        19.567999999  0.0090175115
prosjekt-repopulate                   1           8.986         8.986
prosjekt-populate                     5           8.346         1.6692
prosjekt-save                         6           3.338         0.5563333333
prosjekt-write-object-to-file         6           3.338         0.5563333333
prosjekt-clear                        1           0.64          0.64
prosjekt-add-if                       28670       0.2490000000  8.68...e-006
prosjekt-insert-file                  1857        0.095         5.11...e-005
prosjekt-get-project-item             1858        0.047         2.52...e-005
prosjekt-redo-tags-on-file-save-hook  6           0.0           0.0
prosjekt-set-project-item             1           0.0           0.0
sohailsomani commented 12 years ago

And more output, this time you can see that file-directory-p is a significant portion of the time. I wonder if there is a way to instrument EVERYTHING. Edit; I Did this and Emac scrashed, haha.

prosjekt-walk-path                    28675       28.641999999  0.0009988491
prosjekt-repopulate                   1           10.405        10.405
prosjekt-populate                     5           9.672         1.9344000000
file-directory-p                      28699       4.2980000000  0.0001497613
prosjekt-save                         6           3.4320000000  0.5720000000
prosjekt-write-object-to-file         6           3.4320000000  0.5720000000
prosjekt-add-if                       55175       0.8400000000  1.52...e-005
prosjekt-clear                        1           0.733         0.733
prosjekt-insert-file                  3710        0.2660000000  7.16...e-005
file-relative-name                    3752        0.2510000000  6.68...e-005
file-name-directory                   26865       0.1560000000  5.80...e-006
string-match                          67152       0.139         2.06...e-006
file-name-nondirectory                26547       0.093         3.50...e-006
prosjekt-redo-tags-on-file-save-hook  6           0.0           0.0
prosjekt-get-project-item             3711        0.0           0.0
prosjekt-set-project-item             1           0.0           0.0
char-equal                            3240        0.0           0.0
file-equal-p                          6           0.0           0.0
file-name-sans-versions               54          0.0           0.0
file-in-directory-p                   6           0.0           0.0
file-name-as-directory                9674        0.0           0.0
file-writable-p                       24          0.0           0.0
file-readable-p                       12          0.0           0.0
file-attributes                       36          0.0           0.0
file-exists-p                         318         0.0           0.0
file-remote-p                         7564        0.0           0.0
file-truename                         36          0.0           0.0
abingham commented 12 years ago

Right. I'll up the priority on the issue for ignoring certain directories. Hopefully that will get things in better shape for you.

abingham commented 12 years ago

I was wondering the same thing. If you figure out a way to do it, let me know,

sohailsomani commented 12 years ago

Just FYI, the ability to ignore certain directories made a huge difference. I ignored .svn and all build directories and it was about half the time. Almost bearable!