Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

ImmutableMap is slow #5686

Open Quuxplusone opened 15 years ago

Quuxplusone commented 15 years ago
Bugzilla Link PR5174
Status CONFIRMED
Importance P normal
Reported by Owen Anderson (resistor@mac.com)
Reported on 2009-10-13 01:07:29 -0700
Last modified on 2010-02-22 12:47:48 -0800
Version trunk
Hardware PC All
CC clattner@nondot.org, kremenek@apple.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments imm-gvn.diff (9380 bytes, text/plain)
Blocks
Blocked by
See also
Created attachment 3647
ImmutableMapGVN.diff

ImmutableMap-based GVN is significantly slower than DenseMap-based GVN, even
though the lookup behavior is much better.  Shark shows GVN spending 34% of its
time in GetCanonicalTree(), called from Factory::Add() in InsertDomMap().  Is
there any way we could make this faster?
Quuxplusone commented 15 years ago

Attached imm-gvn.diff (9380 bytes, text/plain): ImmutableMapGVN.diff

Quuxplusone commented 15 years ago
I suspect that this is probably a case of using the wrong data structure for a
given problem.

If all you are doing is using ImmutableMap just like you would use DenseMap, it
is never going to come as close to the performance as DenseMap.  It uses a
balanced-AVL tree, and allocates new tree nodes whenever you do an insert and
removal in order to really leave the original map untouched.  Algorithmically,
DenseMap is just going to leave it in the dust if all you care about are doing
lookups and insert/deletions to a single map.  Where ImmutableMap is useful for
cases where you need (many) multiple copies (or versions) of the same map
around at the same time.  DenseMap would not be a good data structure for such
scenarios.

ImmutableMap also has the invariant that no matter what *order* you
insert/remove values from the map,  you will still get the same physical map.
The ImmutableMap object is just a smart pointer to an AVL tree node (the root),
and operator== is implemented using a simple pointer comparison of the two root
nodes.  This is really value for the static analyzer, as it frequently does
comparisons between maps.  This comes at a cost, however, as such
canonicalization of the maps takes work (via a FoldingSet).

Is there a reason you need to use ImmutableMap for GVN?
Quuxplusone commented 15 years ago

That said, ImmutableMap could potentially be performance tuned further (the static analyzer would certainly benefit from this), but it by design it isn't as fast as DenseMap because it provides a wholly different set of functionality. I did a fair amount of performance work on ImmutableMap recently, and the speedups to the static analyzer were really noticeable.

Quuxplusone commented 15 years ago

GVN currently does a dominator tree walk. At each level of the walk, it has a DenseMap which it then copies for the new node, inserts some stuff, then copies for the next dom tree node, inserts new stuff etc.

GVN needs to keep a different map for each level of the dom tree, but they are all variants of each other. It seemed natural to use immutablemap for this...

Quuxplusone commented 15 years ago
(In reply to comment #3)
> GVN currently does a dominator tree walk.  At each level of the walk, it has a
> DenseMap which it then copies for the new node, inserts some stuff, then
copies
> for the next dom tree node, inserts new stuff etc.
>
> GVN needs to keep a different map for each level of the dom tree, but they are
> all variants of each other.  It seemed natural to use immutablemap for this...
>

Okay, makes sense.  I've re-opened the bug.

From Owen's measurements, it looks GetCanonicalTree() is the real performance
hog.  This feature is needed by the static analyzer, but probably not by GVN.
Providing a bit in the ImmutableMap factory object that turns off
canonicalization will likely result in a huge win.
Quuxplusone commented 15 years ago
(In reply to comment #0)
> Created an attachment (id=3647) [details]
> ImmutableMapGVN.diff
>
> ImmutableMap-based GVN is significantly slower than DenseMap-based GVN, even
> though the lookup behavior is much better.  Shark shows GVN spending 34% of
its
> time in GetCanonicalTree(), called from Factory::Add() in InsertDomMap().  Is
> there any way we could make this faster?
>

Owen:

I just committed r84010, which allows one to turn off the use of
GetCanonicalTree() if one passes "false" to the constructor of
ImmutableMap::Factory:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20091012/088828.html

Does this speed things up for you?
Quuxplusone commented 15 years ago

That does help a lot, but I (after fixing a few bugs in my own patch), it's still not as fast as the original.

What I'm seeing now is a lot of time being spent balancing the tree. At a guess, this is because GVN assigns value numbers sequentially, so it will always be inserting in the order 1, 2, 3, 4, ... which probably results in worst-case unbalanced trees, requiring lots of balancing. Any suggestions?

Quuxplusone commented 15 years ago
(In reply to comment #6)
> That does help a lot, but I (after fixing a few bugs in my own patch), it's
> still not as fast as the original.
>
> What I'm seeing now is a lot of time being spent balancing the tree.  At a
> guess, this is because GVN assigns value numbers sequentially, so it will
> always be inserting in the order 1, 2, 3, 4, ...  which probably results in
> worst-case unbalanced trees, requiring lots of balancing.  Any suggestions?
>

Is there any way to experiment with non-sequential numbers to determine if it
is really a pathological case with the balancing logic?  The ImmutableMap
creates new nodes even when there isn't much balancing required, so I'm
wondering if that logic just needs to be made faster (somehow) or if it is
related to this specific workload.

Incidentally, is there a way for me to look at the performance characteristics
myself? (a test case).  I admit I don't have much experience with analyzing
LLVM passes.
Quuxplusone commented 15 years ago

(BTW, sorry for not replying sooner. I didn't see the mail notification for this PR for a couple days...)