Open code-rex1 opened 7 months ago
@BlastWind @yanyanfu can you please help 🙏
@code-rex1 Sorry for the 12 day delay in responding! I've been busy with my thesis which is due soon.
Any existing functionality within ACER that facilitates type resolution.
Type resolution has been implemented, but no abstraction built-in to ACER facilitates it. We have to manually write AST parsing logic to get the types.
Two examples where we implemented type resolution logic:
The first concern is "resolved" by introducing some ambiguity. We preprocess the directory and create a mapping between shorthands to potentially many full types (ambiguity). The second concern is simply ignored. This is not as bad as it sounds, in call graphs, the goal is to get sound edges, and then hopefully trim the sound graph to get the accurate edges. We wrote type resolvers to help with the second part, accuracy. But in the worst case where we can't get the type, we just connect sound edges using method names as the only indicator.
Again, we built more TypeResolvers for the sake of accuracy in call graphs. From your message, I'm guessing that you want to expand on the accuracy, and so you will need to write better TypeResolvers.
Recommendations for extending ACER to better support type resolution, if necessary.
First of all, let me reframe the concern: How can we build a better type resolver module, and how can we connect the call graph generation part of ACER to this module?
If I have the opportunity to start from scratch, I would spend some time thinking about what alpha conversion can offer. If we do a preprocessing run to rename all duplicate names (but of course, keep a mapping between the new ones and the original), a lot of resolution logic can vanish.
Though we have a Python generator (I didn't write it, the lad who wrote it left the project already), it isn't nearly as functional as the other ones like Pyan and PyCG. Handling Python is way harder. The recursive nature of the Python TypeResolver from the old experiments hopefully gives some insights. By recursive, I mean that when we want to resolve a call like (make_builder()).build()
, we have to resolve the type of (make_builder())
, so we automatically issue the corresponding resolver for (make_build())
, which in this case is a PythonCallResolver
... The code in the old experiments were more sophisticated, but its goal was a complete AST-based CHA generator. This was simply a lot of work, so we started from a clean slate again (which lead to ACER) that had lower aims.
I am currently exploring the ACER framework for a project that involves generating call graphs. My work specifically focuses on the challenges of type resolution, and I believe ACER's AST-based approach could resolve types across different modules and libraries.
Despite going through the available documentation, I've found it somewhat challenging to understand how to leverage ACER in the context of type resolution. I would greatly appreciate any examples if you could share, please.
I'm interested in:
Thank you for your time and for the incredible work on developing ACER. I look forward to your insights.