Open nohehf opened 7 months ago
Recently https://github.com/github/stack-graphs/pull/419 was merged, which ensures non-failing files are indexed even if others are not. Is that what you are after?
Oh ok I did notice this after posting thank you. I guess the CLI output might not reflect this (seemed like a full crash to me). That is indeed, what I was looking for, but do you think it could be more granular than per file ?
Let's say I have the following main.py
file.
foo = "bar"
# this line will fail the indexing
baz = lambda a: a * 2
qux = foo
It does fail on the baz
assignment:
> tree-sitter-stack-graphs-python index main.py
/Users/nohehf/tmp/py/main.py: failed to build stack graph
0: Error executing statement edge @name.def -> @param.param_name at (822, 3)
src/stack-graphs.tsg:822:3:
822 | edge @name.def -> @param.param_name
| ^
in stanza
src/stack-graphs.tsg:818:1:
818 | (parameter/identifier) @param @name
| ^
matching (identifier) node
/Users/nohehf/tmp/py/main.py:4:14:
4 | baz = lambda a: a * 2
| ^
1: Evaluating edge sink
2: Undefined scoped variable [syntax node identifier (4, 14)].param_name
Error: failed to build stack graph
And it seems like the file has not be indexed at all tree-sitter-stack-graphs-python status main.py
outputs nothing.
As I mentioned, GitHub code navigation still works in this file ? Do they run additional / custom .tsg
files ? Or have custom error recovery techniques ?
Do you think it would be possible to only fail on portions of the file ? Like foo
and qux
would be indexed in the stack graph, and only the problematic / failing node baz
is ignored ?
PS: Thanks for the quick answer :))
As I mentioned, GitHub code navigation still works in this file? Do they run additional / custom .tsg files? Or have custom error recovery techniques?
If stack graph data is unavailable, either because the file failed or the language is not supported yet, we fall back on simple name matching using tags. This is based on tags queries that are found in many grammar repos, which don't take scoping into account.
[...] do you think it could be more granular than per file?
Perhaps. It would require a more invasive change in tree-sitter-graph, which is responsible for executing the TSG files. The simplest version would be to skip graph statements like edge
and attr
if it uses scoped variables that are undefined. But I would not like that to happen silently, because it is an indication that there's something missing or an error. They should at least be surfaced as warnings, and perhaps a flag could switch between relax and strict mode. This might be possible while keeping clear execution semantics for the TSG, but it would be some work.
If stack graph data is unavailable, either because the file failed or the language is not supported yet, we fall back on simple name matching using tags. This is based on tags queries that are found in many grammar repos, which don't take scoping into account.
That does make sense, thank you.
This is based on tags queries that are found in many grammar repos
Are you referring to tools like ctags or related tools / techniques ?
Perhaps. It would require a more invasive change in tree-sitter-graph, which is responsible for executing the TSG files. The simplest version would be to skip graph statements like edge and attr if it uses scoped variables that are undefined. But I would not like that to happen silently, because it is an indication that there's something missing or an error. They should at least be surfaced as warnings, and perhaps a flag could switch between relax and strict mode. This might be possible while keeping clear execution semantics for the TSG, but it would be some work.
Yes I 100% agree that this should not be silent (as skipping files). I would like to get my hands on this if I have some time, but I believe I need to familiarize myself quite a bit more with the repo.
Thank you for your answers on the matter, I think this issue could be flagged as a feature idea / proposal, no bug here :))
I would like to get my hands on this if I have some time, but I believe I need to familiarize myself quite a bit more with the repo.
Let me know if you want to do this, then I can point you to the right places and discuss a bit what would be a good approach. I suggest to start with a new issue on tree-sitter-graph to track the effort and discuss overall approach. I think it is easiest to break it up in two parts, (i) implementing a way to return warnings from an execution run, and (ii) changing the behavior around missing nodes and generate warnings for those cases.
Thank you i'll make sure to ping you back here when I have some time. I first need to address #430 :)
Hello,
I've been facing some issues when indexing source code using the tree-sitter-stack-graphs-python crate, particularly with
failed to build stack graph
errors linked to issues #426 & #403.I'm aware these might be resolved in future updates, and I'm also considering diving into the
.tsg
DSL as I become more familiar with this project. Tho I believe that given the complexity of various codebases, similar errors could arise and the future, hence the idea of and having a way to safely index, ignoring potential errors.I noticed that GitHub's code navigation still functions correctly on the very same files that are throwing errors. This leads me to believe there might be a workaround to build the stack graph despite these errors.
Is there a known method to bypass such errors during the graph building process, so the entire codebase isn't compromised by issues in single nodes? If such a feature exists or could be implemented, it would greatly enhance the tool's resilience.
Looking forward to your insights or suggestions.
PS: thank you for this amazing project and related talks / articles, it's been really enjoyable to learn about this