Add a graph viewer for `@kind graph` queries

aeisenberg commented 3 years ago

@hvitved has created two PRs that implement a graph viewer, which have been sitting in our PR triage queue for over 6 months. I've learned that there are a number of users of the viewer. I think this is a great feature to have and there is nothing fundamentally wrong with the PRs as they are (I've given both a quick look in the past). But, I do have some concerns about merging:

Graph viewers are tricky in that they often work for smaller graphs, but become unwieldy for larger graphs. We would need to do some good testing of the viewer to see what its limitations are.
Similarly, a previous attempt to integrate a graph viewer hit up against performance issues for larger graphs. We would need to know if any performance issues exist and how to get around them.
I don't have the time now to do any of this deep testing. (The purpose of this issue is to document the PRs and make sure that when I (Or someone else on the team) has the time, we will know what to do.)

One possibility is that we release the graph viewer as a canary-only feature so that regular users don't try it out with unrealistic expectations.

811
705

gsingh93 commented 3 years ago

Any update on this? Releasing it but requiring the user to enable it with an experimental flag seems to be a good option. It seems like the main issues you pointed out are related to more thorough testing, and users like myself are interested in doing that if it's easy to do so.

aeisenberg commented 3 years ago

My concern is that https://github.com/github/vscode-codeql/pull/811 changes some of the core datastructures in the extension and some of its uses are not very well tested, so I need to make sure that the PR doesn't break any existing behaviour (things like pagination and sorting). And, I'd like to make sure that the changes to the datastructures are the minimum necessary to get the graph viewer to work. This will take a non-trivial look at and test, which is why it has been open for so long.

I do intend to look at this, but haven't been able to spend the time just yet. I think this is a good change and would like to get this in, but I also want to be cautious.

kjcolley7 commented 3 years ago

This issue is a more generalized solution for #571 and is a duplicate of #384. I'd also like to put a huge +1 on this issue, the graph viewer from QL4E was one of my favorite and most-used features. I'd often run a query to display some partial function call graph, such as all function calls that eventually call some target function for determining reachability. I'm also interested in using it for things such as drawing CFGs, partial data flow graphs, modified ASTs for domain-specific languages, etc. I'm very willing to deal with limitations on overall graph size for performance reasons, at least in the initial implementation. Optimizing this to work on larger graphs could be a separate task.

I know that the old graph viewer had support for graph attributes. I think it would be awesome if a query could use graph/node/edge attributes (a la Graphviz) to heavily customize the way a graph is displayed. That's a bigger ask though, and the most important thing to me is the core functionality of displaying the graph and having nodes/edges be linkable to source locations. Being able to pan, zoom, and move nodes around is also nice.

Luuthetruyen commented 2 years ago

Truyen

kjcolley7 commented 2 weeks ago

@aeisenberg Is this issue still on your radar? It's now been 3 years since the last update, and I'd like to again echo that the ability to create and display graphs from CodeQL queries is hugely valuable. Even just supporting the bare minimum would be amazing, regardless of performance. Performance should be left up to the user (to not attempt to render hugely massive graphs) and/or future improvements (after releasing an initial, working version).

In my opinion, the "bare minimum" is to allow a query to produce nodes and edges, with the nodes being clickable to go to source locations. The most common way I've used this feature in the past with Semmle Studio was drawing custom function call graphs, which exclude most functions and instead only focus on ones I'm interested in (perhaps showing calls into and out of a specific function, or showing all the chains of function calls that eventually reach a target function, or even using codebase-specific knowledge to show function calls through function pointers that act as call tables).

I have also found it useful to add the "semmle.label" attribute to edges to indicate what type an edge is. Even this though doesn't need to be implemented in the v0 implementation for the VSCode extension.

As a reminder, the previous query format (for @kind graph) looked like this:

query predicate nodes(Foo node, string attr, string value) { ... }
query predicate edges(Foo a, Foo b, string attr, string value) {
    attr = "semmle.label"
    and ...
}

Even in the CodeQL standard library, there are places that use @kind graph. For example, for visually displaying the IR that is built for a function: https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/ir/implementation/aliased_ssa/PrintIR.qll

That also shows a query predicate parents(...), though I'd argue even that isn't required for the initial implementation (I've never had a need to use it).

Basically, overall my ask is if we can just start with the bare minimum required to start visualizing the results of queries with @kind graph. After that, we can iteratively start improving other things like performance, features, etc.

aeisenberg commented 2 weeks ago

This issue is still on our backlog. I would recommend that you run codeql database analyze from the command line with the output format of graphtext, dgml, or dot. You can then import the generated graph into a viewer. Eg, dot is a graphviz format and there are all sorts of viewers you can use for visualization.

github / vscode-codeql

Add a graph viewer for `@kind graph` queries #877

811

705