github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.61k stars 1.52k forks source link

result abbreviation #9890

Open lrecknagel opened 2 years ago

lrecknagel commented 2 years ago

Maybe I get something wrong but I cant help myself nor find any useful hints in existing issues or documentation regarding the following case:

import javascript

from Function fn
select
  fn.getAReturnedExpr(),
  fn.getAReturnedExpr().toString()

Any help is appreciated - thanks to all of you in advance and also to make all this amazing stuff happen!

alexet commented 2 years ago

in general toString is expected to return a short human readable string and also be easy to compute. For these reasons toString returns a short string that usually only shows 1 level of information. We also don't have the full source for objects in the database so the abbreviated object is the only option available.

However things should also have getLocation that returns the location of the object. If it does have getLocation just clicking on the result in vscode should link you directly to the actual source code.

intrigus-lgtm commented 2 years ago

Hi, I once asked a similar question: https://github.com/github/securitylab/discussions/53 Maybe this will help you.

lrecknagel commented 2 years ago

@alexet thanks for you explanation - I also see this possibility but its not very scalable for larger result sets imho. @intrigus-lgtm thanks for pointing me there.

@alexet is there anything I can support to "fix" this issue as you pointed out in https://github.com/github/securitylab/discussions/53

alexet commented 2 years ago

So that problem is slightly different because it deals with an expression that is essentially an atomic value being truncated.

The general problem ends up being much harder. The main problem is the underlying codeql string representation which doesn't allow sharing between strings. As most of the time we only deal with the short strings (like inside identifiers) we want to tune the representation towards the actual uses. However if we were to have all the actual expressions between the language it would take O(nestingDepth*programSize) space. The evaluation model of codeql is bottom up so every predicate is evaluated in full so we are essentially required to compute all those strings.

What is the actual use case for getting the full text? There is usually a better way than relying on the full text anyway.