dominikh / go-tools

Staticcheck - The advanced Go linter
https://staticcheck.dev
MIT License
6.19k stars 377 forks source link

Idea: canned queries #332

Open dominikh opened 6 years ago

dominikh commented 6 years ago

Often times, we're interested in making one-off queries (find all X where Y) that don't make sense as automatic checks in staticcheck, but are useful to drive human decisions. There's https://github.com/mvdan/gogrep for structural queries, but a lot of the time, queries can be more complex, such as find all functions where an argument always gets the same value.

It is quite difficult to come up with a good general language for writing these queries. All attempts on my side have ended up with more code than if all supported checks were just being written in pure Go. But writing them in pure Go can be quite verbose, on the order of a page of code to match a 3 line code snippet. Writing and throwing away such queries is a huge waste of time. Maybe we should have a tool with a library of queries? If a query was useful once, it'll probably be useful again.

Users would be welcome to contribute their own queries that they needed, though I realize that the APIs aren't very approachable to a lot of people. They could always ask me to write those queries, but it could take me too long to get around to it. Queries are usually needed in the here and now.

Also, due to having fixed releases, new queries may not become available for months. This would bring us back to wanting a custom language that could be used for scripting, with an online repository of scripts that is independent of the release cycle of the tools. Maybe instead of trying to come up with a query language, we could use an embedded interpreter (Lua, JS, Skylark) and expose an API that is close to the pure Go API.

ainar-g commented 6 years ago

For the last year or so I've been considering using relational algebra as a representation for code. If we had some sort of "export to sql" functionality, you could write your complex query as something like "SELECT function name FROM function calls JOIN arguments JOIN proven properties GROUP BY function name HAVING count(proven value) = 1".

dominikh commented 6 years ago

There are some languages and papers that explore various query languages for code. Prominently SQL and XPath flavours. Neither really yielded usable results IMO. SQL much less so than XPath though, since SQL isn't really suitable for tree structures, and most of static analysis is either trees or graphs.

jimmyfrasche commented 6 years ago

I have no doubt it could be done in sqlite using a similar approach to the json extension https://www.sqlite.org/json1.html but that involves coming up with a second xpath-ish query language for the tree portion and writing lots of hooks into the sqlite engine which would likely involve a custom cgo driver to wire everything together.

That all sounds very fun, but it would probably be much simpler to use an embedable scripting language written in Go.

dgryski commented 6 years ago

Makes me think of https://www.youtube.com/watch?v=mVbDzTM21BQ

dominikh commented 6 years ago

Hm, what about GraphQL?

dmitshur commented 6 years ago

If you can come up with an appropriate schema, GraphQL could be a a good choice to consider.

ainar-g commented 6 years ago

This could work really well with Prolog/Datalog. The query in the original post would work beautifully:

function(f1).
function(f2).

argument(f1, x).
argument(f2, y).

has_value(x, 1).
has_value(y, 1).
has_value(y, 2).

has_constant_argument(F) :-
  function(F),
  argument(F, A),
  findall(V, has_value(A, V), Values),
  length(Values, 1).
seebs commented 5 years ago

I just reinvented basically this. I'm looking at an API change which would basically want to replace all x.mutator() with x = x.mutator() calls, and gosh, it would be convenient to be able to check for any calls not looking like that. (yes, i'm too lazy to search for ^[^=]*mutator.)