Closed adrianherrera closed 1 year ago
@adrianherrera Good question! In short, our internal experiments indicate that callsite sensitivity is usually the best option.
Generally, there's a very unclear performance/precision trade-off. More precise contexts (extra depth, callsite over caller, etc.) can yield pathologically bad performance because they increase the number of times each function gets analyzed (increase the asymptotic worst-case runtime). However, they also reduce the number of spurrious (impreise) points-to facts that then have to get propagated (copied) throughout the rest of the analysis. Some codebases can only be analyzed at high levels of context-sensitivity! (Consider, for example, a program where every call to malloc
goes through a function malloc_wrapper
- 2-context sensitivity will do much better than context-insensitivity.) Picking the right performance/precision trade-off is very, very hard in general.
This paper has some general reflections, including commentary on some strategies we don't support (object- and type-sensitivity): https://dl.acm.org/doi/abs/10.1145/1926385.1926390
That's just a quick overall sketch, let me know if you have further questions! Feel free to post general questions here in the issues, or in the "Discussions" as well.
As far as the specific difference between caller
- and callsite
-sensitivity, caller
records just the names of calling functions, while callsite
also records which instruction within the function was the callsite (thus refining caller
-sensitivity). callsite
-sensitivity allows the analysis to differentiate between multiple calls to the same callee in a single function, which can help with precision if the callsites operate on distinct memory objects. An example might be a function that uses string operations on two distinct buffers.
Thanks so much @thinkmoore that totally makes sense to me! And thanks @langston-barrett for the, ahem, pointers. I'll have a read of that paper.
Am I correct in assuming that the file-*
sensitivities listed here are not implemented?
It looks like they may have been removed some time ago during some major refactoring. (They use only what file the caller is defined in, and so are less precise than caller
sensitive.) @langston-barrett, IIRC it wasn't a great trade off?
It looks like they may have been removed some time ago during some major refactoring. (They use only what file the caller is defined in, and so are less precise than caller sensitive.) @langston-barrett, IIRC it wasn't a great trade off?
Yeah, I believe all of this is correct.
Ok cool. Sorry, one final question: does it make sense to make k
user definable? Is there any particular reason it caps out at 9?
If there is a reason, it's probably just that it seemed harder to make it complete configurable. That being said, cclyzer++ won't terminate on almost any program at k > 9.
Makes sense. Ok, I think that's all my questions on this. Thanks folks!
Hellooo!!
I'm trying to understand the different call-site sensitivities listed here. Specifically, the difference between "caller", "callsite", and "file". In particular, I don't see the "file" sensitivity listed under
user-options.dl
. What is the difference between the "caller" and "callsite" sensitivities? When I readcontext_item_by_invoc_interim
incontext/interfact.dl
I can see that the "caller" sensitivity looks at the caller function name. But I don't have an intuition as to what practical difference this makes, and when I should pick one over the other.Any intuitive explanations for a static analysis n0ob would be greatly appreciated!