Guidance on Using ACER for Type Resolution

code-rex1 commented 7 months ago

I am currently exploring the ACER framework for a project that involves generating call graphs. My work specifically focuses on the challenges of type resolution, and I believe ACER's AST-based approach could resolve types across different modules and libraries.

Despite going through the available documentation, I've found it somewhat challenging to understand how to leverage ACER in the context of type resolution. I would greatly appreciate any examples if you could share, please.

I'm interested in:

Any existing functionality within ACER that facilitates type resolution.
Recommendations for extending ACER to better support type resolution, if necessary.

Thank you for your time and for the incredible work on developing ACER. I look forward to your insights.

code-rex1 commented 7 months ago

@BlastWind @yanyanfu can you please help 🙏

BlastWind commented 7 months ago

@code-rex1 Sorry for the 12 day delay in responding! I've been busy with my thesis which is due soon.

Addressing your two interests

Any existing functionality within ACER that facilitates type resolution.

Type resolution has been implemented, but no abstraction built-in to ACER facilitates it. We have to manually write AST parsing logic to get the types.

Two examples where we implemented type resolution logic:

In JavaSCHA (a Simple Class Hierarchy Analysis call graph generator), we resolve the type that an identifier correspond to by walking upwards in the AST. There are two caveats to this approach:
1. The type that shows up in the AST is often just the shorthand of the real, full type.
2. Sometimes you can't get the type just by walking upwards in the AST. Some languages support hoisting, so you have to walk downwards as well. Even more difficult, in Java, an identifier might actually be defined as the superclass's field, in a different file. This is all logic that you have to implement.

The first concern is "resolved" by introducing some ambiguity. We preprocess the directory and create a mapping between shorthands to potentially many full types (ambiguity). The second concern is simply ignored. This is not as bad as it sounds, in call graphs, the goal is to get sound edges, and then hopefully trim the sound graph to get the accurate edges. We wrote type resolvers to help with the second part, accuracy. But in the worst case where we can't get the type, we just connect sound edges using method names as the only indicator.

The old experiments in writing a complete CHA generator actually has way more TypeResolvers, as you can see here (python included).

Again, we built more TypeResolvers for the sake of accuracy in call graphs. From your message, I'm guessing that you want to expand on the accuracy, and so you will need to write better TypeResolvers.

Recommendations for extending ACER to better support type resolution, if necessary.

First of all, let me reframe the concern: How can we build a better type resolver module, and how can we connect the call graph generation part of ACER to this module?

If I have the opportunity to start from scratch, I would spend some time thinking about what alpha conversion can offer. If we do a preprocessing run to rename all duplicate names (but of course, keep a mapping between the new ones and the original), a lot of resolution logic can vanish.

Additional Comments

Though we have a Python generator (I didn't write it, the lad who wrote it left the project already), it isn't nearly as functional as the other ones like Pyan and PyCG. Handling Python is way harder. The recursive nature of the Python TypeResolver from the old experiments hopefully gives some insights. By recursive, I mean that when we want to resolve a call like (make_builder()).build(), we have to resolve the type of (make_builder()), so we automatically issue the corresponding resolver for (make_build()), which in this case is a PythonCallResolver... The code in the old experiments were more sophisticated, but its goal was a complete AST-based CHA generator. This was simply a lot of work, so we started from a clean slate again (which lead to ACER) that had lower aims.

WM-SEMERU / ACER

Guidance on Using ACER for Type Resolution #4

Addressing your two interests

Additional Comments