RomanYankovsky / DelphiAST

Abstract syntax tree builder for Delphi
Mozilla Public License 2.0
271 stars 116 forks source link

Improve performance #75

Open JBontes opened 9 years ago

JBontes commented 9 years ago

I've ran AST through the profiler.
Most time is spend setting attributes. The problem is that a dictionary is used, even though there are only a couple of different attributes.
The dictonary does a hash for every lookup and storage. This eats around 40% of the time spend.
An alternative might be to:

It looks I can get away with just 16 different attributes.
Not much less because methods can have many attributes.

After that is done, further candidates for optimization might slide into view.

Another issue that takes a lot of time is the creation and destruction of TSyntaxNode.
Part of this is taken up by the initialization time of the TDirectionary, however it is still substantial, on option might be to make the TSyntaxNode a record or preallocate them in a pool and not destroy them, but return them to the pool when done.

JBontes commented 9 years ago

see pull request #80

vintagedave commented 9 years ago

Many of the strings could be replaced completely by enums, especially if there were specialised node types for particular kinds of element. Eg, the method type of "procedure", "function" etc could be an enum property for a TMethodSyntaxNode, and the same for many other attributes.

JBontes commented 9 years ago

@vintagedave, see the pull request #80 Here I've done just that. However I agree that make several subclasses of TSyntaxNode would make sense.
Do you have a hierarchy in mind? /-------> TSyntaxClass { = class(...)} TAbstractSyntaxClass-> TSyntaxObject { = object(...) } **> TSyntaxInterface
_**> TSyntaxRecord TSyntaxMethod

spring to mind, but having a more or less complete list would be good.

RomanYankovsky commented 9 years ago

Actually, I was going to make the abstract tree typed. Then we will be able to get rid of attributes dictionary. For instance, it would be TFunctionNode class with Line, Col and Name properties.

JBontes commented 9 years ago

Yes, that would simplify things a lot, but we still need attributes for forward, stdcall external etc etc

JBontes commented 9 years ago

Or do you intend to make all possible attrributes a property of the various class types.

RomanYankovsky commented 9 years ago

A method node will be TMethodNode with forward/stdcall/external etc boolean properties. That's the point of making the tree typed. We can have any properties for each node type.

I used attributes in the beggining as a "poor man's" properties. One day we have to refactor that.

JBontes commented 9 years ago

OK, I can work with that. Getting rid of the dictionary did speed things up a lot. Let me see if I can refactor. So the TAbtractSyntaxNode will have a property: NodeKind: TNodeKind; TNodeKind = (nkClass, nkObject, nkProcedure etc

RomanYankovsky commented 9 years ago

The syntax node already has the NodeType property.

I think it should be looking like this...

TSyntaxNode = class
// current syntax node class
end

TMethodNode = class(TSyntaxNode)
public
  property Name: string read FName write FName;
  property Directives: TDirectives read FDirectives write FDirectives;
end;

TClassNode = class(TSyntaxNode)
public
  property Abstract: Boolean read FAbstract write FAbstract;
  property Name: string read FName write FName;  
end;

and so on. For each node type. This requires a lot of work though.

JBontes commented 9 years ago

Yes, but with a clever hierarchy it would be doable. I remember that with the Delphi 3 documentation we had a syntaxtree in the printed documentation.
Listing all the different ways in which to construct classes, methods etc.
Is there a resource on the internet, because I don't have access to these books (they're in Holland and I live in South Africa now).

JBontes commented 9 years ago

Got it: http://portal.aauj.edu/portal_resources/downloads/programming/delphi_object_pascal_language_guide.pdf

vintagedave commented 9 years ago

Actually, I was going to make the abstract tree typed

Yes, I should have mentioned it was Roman's idea to introduce types (although I'd been thinking about it as well.)

A method node will be TMethodNode with forward/stdcall/external etc boolean properties.

Could some of those be sets and enums? MethodQualifiers : set of (mqExternal, mqForward, ...); CallingConvention: (ccRegister, ccCdecl, ccStdcall...) etc?

And - one even bigger idea. Could we make them interfaced types? ISyntaxNode, IMethodNode, etc? Then both (a) the implementation can be separate, which could be good for all sorts of reasons, and (b) you get automatic memory management via interface references. (That could be a problem if nodes refer to nodes higher up in the tree though.)

RomanYankovsky commented 9 years ago

Could some of those be sets and enums? MethodQualifiers : set of (mqExternal, mqForward, ...); CallingConvention: (ccRegister, ccCdecl, ccStdcall...) etc?

Of course, this would be even better.

Could we make them interfaced types?

Not sure about that... Reference counting may cause performance issues. DelphiAST creates thousands of classes to produce the syntax tree (more than 100,000 of TSyntaxNode instances, when input file is big enough). How many times the reference counter would be called?

vintagedave commented 9 years ago

Probably several times 100,000.

Strict use of const in method params prevents the refcount code calling the method, but afaik it still happens every time you do something like, say, MyRef = List[0] - it will increment the refcount assigning to MyRef, even though the item is still present in the list.

I'm not sure what the actual impact would be. I've only ever had one project where the refcounting overhead was significant, and that was for fast realtime graphics updates looping over a lot of interfaced objects.

JBontes commented 9 years ago

We can still use interfaces.
We don't have to care about refcounts.

We can also use a custom variant of TInterfacedObject where the refcounter is a no-op. So I think this problem is solvable.
Just use explicit destruction.

Let me ask the stackoverflow oracle to see if there is a way to use interfaces without try-finally's all over the place.


From: David Millington [mailto:notifications@github.com] To: RomanYankovsky/DelphiAST [mailto:DelphiAST@noreply.github.com] Cc: JBontes [mailto:johan@digitsolutions.nl] Sent: Sun, 26 Apr 2015 20:07:47 +0100 Subject: Re: [DelphiAST] Improve performance (#75)

Probably several times 100,000.

Strict use of const in method params prevents the refcount code calling the method, but afaik it still happens every time you do something like, say, MyRef = List[0] - it will increment the refcount assigning to MyRef, even though the item is still present in the list.

I'm not sure what the actual impact would be. I've only ever had one project where the refcounting overhead was significant, and that was for fast realtime graphics updates looping over a lot of interfaced objects.

— Reply to this email directly or view it on GitHub.

vincentparrett commented 9 years ago

There is no point using non reference counted interfaces here, the compiler will still generate the addref release calls, and from experience its easy to end up with difficult to find av's when you hold references to destroyed objects.. and then release is called on it later, more trouble than it's worth.

If performance is critical, records might be another option, much less overhead, and the operator overloading available could open up some interesting uses. You do though have to manage the memory yourself.

JBontes commented 6 years ago

With the dictionary removed, I think we can close this issue now.