github-vet / bots

Bots for running analysis on GitHub's public Go repositories and crowdsourcing their classification.
MIT License
1 stars 1 forks source link

callgraph simplification: only consider function declarations which use pointers #102

Closed kalexmills closed 3 years ago

kalexmills commented 3 years ago

There is no reason to consider an edge of the callgraph for a function whose declaration contains no pointers.

We can potentially experience a massive decrease in false-positives and the size of the callgraph reported by the tool this way.

However once we remove these edges, every other function in the current package will be absent from the callgraph just like third-party functions. After making this change, it isn't immediately clear that we will be able to distinguish between functions defined in the repository which do not use pointers and functions defined outside the repository

But I think we can do so if we take into account information present at the callsite and assume that the repository source type-checks (even though we don't actually type-check it ourselves).

Here's a short "proof". Maybe it's only useful to help me wrap my head around it, and maybe I've overcomplicated the claim that needs to be shown. I'm putting it here in case someone can explain how it's wrong. Hopefully I've written it in a way that isn't too confusing -- if not, feel free to ask questions, I don't bite.

Proof sketch If we are passing a pointer to a function there are only three options. 1) the function is declared in the current repo and, a) it has a pointer argument (and is part of the callgraph), or b) it does not have any pointer arguments (and is therefore omitted from the callgraph). 2) the function is third-party and defined elsewhere. Suppose we construct the callgraph as described above (only include functions which take a pointer as an argument). We need to show that we have enough information to distinguish between cases 1a, 1b, and 2 at the callsite. Suppose we see a callsite whose function falls into **case 1a**. - since this function is part of the callgraph, our program has enough information to classify it as case 1a. Suppose we see a callsite whose function falls into **case 1b**. - this function won't be found in the callgraph; but we also won't consider any callsites like this in the first place; since we can detect when a pointer is being passed into a callsite, and we only ask the callgraph about function calls where a pointer is used. Note that we do not need to use type-checking or local name resolution to 'detect when a pointer is being passed into a callsite' in the previous sentence, since we terminate our search whenever a pointer argument is reassigned. Suppose that we see a callsite whose function falls into **case 2**. - this function will not be found in the callgraph. If the function doesn't take a pointer; we don't care about it. However, if it _does_ take a pointer, we will know it is a third-party function via it's absence from the callgraph.
kalexmills commented 3 years ago

Want to try to implement this as part of a massive simplification / rewrite.

Only analyzers should be needed: callgraph, packid, and looppointer.

EDIT: a massive rewrite may not be needed.

kalexmills commented 3 years ago

The hope is that this should help to reduce the size of the reported graphs for nogofunc and also cut down on false positives reported by that analyzer.