Trying to understand different call-site sensitivities

adrianherrera commented 1 year ago

Hellooo!!

I'm trying to understand the different call-site sensitivities listed here. Specifically, the difference between "caller", "callsite", and "file". In particular, I don't see the "file" sensitivity listed under user-options.dl. What is the difference between the "caller" and "callsite" sensitivities? When I read context_item_by_invoc_interim in context/interfact.dl I can see that the "caller" sensitivity looks at the caller function name. But I don't have an intuition as to what practical difference this makes, and when I should pick one over the other.

Any intuitive explanations for a static analysis n0ob would be greatly appreciated!

langston-barrett commented 1 year ago

@adrianherrera Good question! In short, our internal experiments indicate that callsite sensitivity is usually the best option.

Generally, there's a very unclear performance/precision trade-off. More precise contexts (extra depth, callsite over caller, etc.) can yield pathologically bad performance because they increase the number of times each function gets analyzed (increase the asymptotic worst-case runtime). However, they also reduce the number of spurrious (impreise) points-to facts that then have to get propagated (copied) throughout the rest of the analysis. Some codebases can only be analyzed at high levels of context-sensitivity! (Consider, for example, a program where every call to malloc goes through a function malloc_wrapper - 2-context sensitivity will do much better than context-insensitivity.) Picking the right performance/precision trade-off is very, very hard in general.

This paper has some general reflections, including commentary on some strategies we don't support (object- and type-sensitivity): https://dl.acm.org/doi/abs/10.1145/1926385.1926390

That's just a quick overall sketch, let me know if you have further questions! Feel free to post general questions here in the issues, or in the "Discussions" as well.

thinkmoore commented 1 year ago

As far as the specific difference between caller- and callsite-sensitivity, caller records just the names of calling functions, while callsite also records which instruction within the function was the callsite (thus refining caller-sensitivity). callsite-sensitivity allows the analysis to differentiate between multiple calls to the same callee in a single function, which can help with precision if the callsites operate on distinct memory objects. An example might be a function that uses string operations on two distinct buffers.

adrianherrera commented 1 year ago

Thanks so much @thinkmoore that totally makes sense to me! And thanks @langston-barrett for the, ahem, pointers. I'll have a read of that paper.

Am I correct in assuming that the file-* sensitivities listed here are not implemented?

thinkmoore commented 1 year ago

It looks like they may have been removed some time ago during some major refactoring. (They use only what file the caller is defined in, and so are less precise than caller sensitive.) @langston-barrett, IIRC it wasn't a great trade off?

langston-barrett commented 1 year ago

It looks like they may have been removed some time ago during some major refactoring. (They use only what file the caller is defined in, and so are less precise than caller sensitive.) @langston-barrett, IIRC it wasn't a great trade off?

Yeah, I believe all of this is correct.

adrianherrera commented 1 year ago

Ok cool. Sorry, one final question: does it make sense to make k user definable? Is there any particular reason it caps out at 9?

langston-barrett commented 1 year ago

If there is a reason, it's probably just that it seemed harder to make it complete configurable. That being said, cclyzer++ won't terminate on almost any program at k > 9.

adrianherrera commented 1 year ago

Makes sense. Ok, I think that's all my questions on this. Thanks folks!

GaloisInc / cclyzerpp

Trying to understand different call-site sensitivities #151