Open abingham opened 12 years ago
This root of this problem seems to be in the fact that we do a linear lookup in the project-file list for each match we find. That is, for each file that matches the population pattern, we make sure it's not already in the file list. We need to use a sorted list for this so that we can get the time down to logarithmic.
This may not be the entire problem, but I saw dramatic improvements in speed on a test project when I jimmied together a fix.
@kingcheez I'm just tagging you so that you'll see any progress on this front.
On 25/09/2012 6:06 AM, abingham wrote:
This root of this problem seems to be in the fact that we do a linear lookup in the project-file list for each match we find. That is, for each file that matches the population pattern, we make sure it's not already in the file list. We need to use a sorted list for this so that we can get the time down to logarithmic.
This may not be the entire problem, but I saw dramatic improvements in speed on a test project when I jimmied together a fix.
— Reply to this email directly or view it on GitHub https://github.com/abingham/prosjekt/issues/18#issuecomment-8849237.
Hey, you might want to look at using a hash table (and thanks for the tag).
http://www.gnu.org/software/emacs/manual/html_node/elisp/Hash-Tables.html#Hash-Tables
Right, that looks like what I need. Hopefully I'll be able to put something faster in place in a few days. Thanks for the tip.
This should be fixed now. I updates the project format to store files in a hash-table rather than a list. On a synthetic project of 10k files it was much faster. I've tested this a fair amount, and it the changes seem good. There's even code to upgrade existing, old-style projects on load.
I just tried it and it doesn't seem any faster to me on the 2K file project. I can see that you are using a hashtable when I run prosjekt-setup so I'm not sure why. Is ther ea way to profile elisp code?
Here is the output of elp (http://www.python.org/emacs/elp.el):
prosjekt-walk-path 10770 66.941999999 0.0062155988
prosjekt-repopulate 1 18.656 18.656
prosjekt-populate 5 18.064 3.6128
prosjekt-save 6 3.392 0.5653333333
prosjekt-write-object-to-file 6 3.392 0.5653333333
prosjekt-clear 1 0.592 0.592
prosjekt-add-if 58240 0.2690000000 4.61...e-006
prosjekt-insert-file 1856 0.0340000000 1.83...e-005
prosjekt-redo-tags-on-file-save-hook 6 0.006 0.001
prosjekt-get-project-item 1857 0.003 1.61...e-006
prosjekt-set-project-item 1 0.0 0.0
So the thing that jumps out to me is why prosjekt-save is called so many times when it should really only be called once. That looks as if it would shave off a lot of time from the run. Maybe you can have a dynamic variable that you can set when doing prosjekt-repopulate:
(let (*don't-save-project* t)
(prosjekt-repopulate...))
(save-project)
functions ar eimaginary obviously.
You can get these results by doing load-library elp, M-x elp-set-master prosjekt-repopulate, M-x elp-results, M-x elp-unset-master
You're right that prosjekt-save is called more than needed, I think we can address that fairly straightforwardly. But is it really the issue here? If I'm reading the table correctly, we're spending 67 sec. in prosjekt-walk-path alone, i.e. not counting functions it calls. We're only spending 3.4 sec. in prosjekt-save, so we'd still have pretty awful performance even if prosjekt-save took no time at all.
I'm seeing results similar to yours when I use elp. At least, I'm seeing prosjekt-walk-path dominate the reported run time. Strangely, the data from elp doesn't seem to match reality. The results I see say that 15 sec. is spent in prosjekt-walk-path, yet the entire repopulation only takes on the order of 5 sec. Still, even if I assume that the reported times are correct in a relative sense, I think I need to focus on prosjekt-walk-path.
A bit more data. If I modify prosjekt-populate to not save, then I get the number of prosjekt-save calls down to 1. This doesn't noticeably change the repopulation time for me (or the profile information.)
@kingcheez can you try something? Change the prosjekt-populate definition from "defun-autosave" to just "defun" and run a repopulation. Does that significantly change anything for you?
Nope, no difference.
As for prosjekt-walk-path
, yeah, my hash table has 2463 files so I don't know why it is being called so many times. That might be the culprit!
Edit: It's probably called so often because of svn files. So if we aren't ignoring svn files until we get to the filter, that might explain why it would be called 5,000 times but not 10,000 times
Aha, I was wrong about the size of the project:
$ find . -type f | wc -l
9495
So 10,000 is correct because about half of those will be svn directories (some build directories too).
So svn is the problem :-( Any way to prevent descending into svn directories?
I'm going to have to do some tinkering on this. There's nothing in the prosjekt-walk-path code that is obvious for improvement, so I'm open to ideas. That code was actually lifted almost verbatim from the emacs wiki, so I assume it's not wildly incorrect. There may be C-based traversal implementations we can use, but I'd rather keep things in elisp if possible.
As for skipping svn directories, yes, that's already on the radar. See issue #11 (which will include svn when it's done.)
Any chance you could tell me how to hack it in so I could see if it helps? I add files regularly to the project and having prosjekt-repopulate run fast would be SOOO nice :)
This is untested, but I think it'll work. On line 509 try changing:
(cond ((member file '("." "..")))
to:
(cond ((member file '("." ".." ".svn")))
This original code is there to avoid recursing on the current and parent directories. This just adds .svn to the list of "directories to skip."
My results after doing the above:
prosjekt-repopulate 1 8.949 8.949
prosjekt-save 6 3.239 0.5398333333
prosjekt-write-object-to-file 6 3.239 0.5398333333
prosjekt-clear 1 0.536 0.536
prosjekt-add-if 28660 0.1960000000 6.83...e-006
prosjekt-insert-file 1856 0.0600000000 3.23...e-005
prosjekt-redo-tags-on-file-save-hook 6 0.006 0.001
prosjekt-get-project-item 1857 0.002 1.07...e-006
prosjekt-set-project-item 1 0.0 0.0
Edit: Updated results with correct output. Just noticed that walk-path isn't even tehre anymore. I think I broke it!
It isn't THAT much faster, btw.
Well, I'm stumped. I've tried a number of ways to speed things up, but I can't make prosjekt-walk-path any faster. It's frustrating because system commands like "find" can do the traversal nearly instantly. I guess either a limitation of the emacs lisp file API, or I'm just missing something.
I'll keep this in the back of my head, but for now I don't have any great ideas. We could explore things like running system commands and parsing the results, but that quickly leads to platform issues. We might be able to make this appear faster by running it asynchronously, but that's a can of worms I'd sooner avoid.
Any ideas on how to speed up this function would be greatly appreciated!
no ideas besides forking an emacs process :(
Update: I used elp-instrument-package and got the following results. So the times are definitely better when adding .svn to the directories to ignore. Still takes a long time though.
prosjekt-walk-path 2170 19.567999999 0.0090175115
prosjekt-repopulate 1 8.986 8.986
prosjekt-populate 5 8.346 1.6692
prosjekt-save 6 3.338 0.5563333333
prosjekt-write-object-to-file 6 3.338 0.5563333333
prosjekt-clear 1 0.64 0.64
prosjekt-add-if 28670 0.2490000000 8.68...e-006
prosjekt-insert-file 1857 0.095 5.11...e-005
prosjekt-get-project-item 1858 0.047 2.52...e-005
prosjekt-redo-tags-on-file-save-hook 6 0.0 0.0
prosjekt-set-project-item 1 0.0 0.0
And more output, this time you can see that file-directory-p is a significant portion of the time. I wonder if there is a way to instrument EVERYTHING. Edit; I Did this and Emac scrashed, haha.
prosjekt-walk-path 28675 28.641999999 0.0009988491
prosjekt-repopulate 1 10.405 10.405
prosjekt-populate 5 9.672 1.9344000000
file-directory-p 28699 4.2980000000 0.0001497613
prosjekt-save 6 3.4320000000 0.5720000000
prosjekt-write-object-to-file 6 3.4320000000 0.5720000000
prosjekt-add-if 55175 0.8400000000 1.52...e-005
prosjekt-clear 1 0.733 0.733
prosjekt-insert-file 3710 0.2660000000 7.16...e-005
file-relative-name 3752 0.2510000000 6.68...e-005
file-name-directory 26865 0.1560000000 5.80...e-006
string-match 67152 0.139 2.06...e-006
file-name-nondirectory 26547 0.093 3.50...e-006
prosjekt-redo-tags-on-file-save-hook 6 0.0 0.0
prosjekt-get-project-item 3711 0.0 0.0
prosjekt-set-project-item 1 0.0 0.0
char-equal 3240 0.0 0.0
file-equal-p 6 0.0 0.0
file-name-sans-versions 54 0.0 0.0
file-in-directory-p 6 0.0 0.0
file-name-as-directory 9674 0.0 0.0
file-writable-p 24 0.0 0.0
file-readable-p 12 0.0 0.0
file-attributes 36 0.0 0.0
file-exists-p 318 0.0 0.0
file-remote-p 7564 0.0 0.0
file-truename 36 0.0 0.0
Right. I'll up the priority on the issue for ignoring certain directories. Hopefully that will get things in better shape for you.
I was wondering the same thing. If you figure out a way to do it, let me know,
Just FYI, the ability to ignore certain directories made a huge difference. I ignored .svn and all build directories and it was about half the time. Almost bearable!
The populate routines are apparently slow, too. See about speeding them up.