github / stack-graphs

Rust implementation of stack graphs
https://docs.rs/stack-graphs/*/stack_graphs/
Apache License 2.0
752 stars 130 forks source link

Python module resolution not working via tree-sitter-stack-graphs-python cli #430

Closed nohehf closed 4 months ago

nohehf commented 5 months ago

Hello again, I've been testing tree-sitter-stack-graphs-python via the cli, and while I have great results on a single file, I was not able to obtain concluent output on multiple files with modules.

Here is a minimal reproduction of my issue:

Given the following directory structure (working dir being: /Users/nohehf/tmp/py):

.
├── main.py
└── module.py

With:

# main.py
import module

baz = module.foo
# module.py
foo = "bar"

If I run:

DIR="/Users/nohehf/tmp/py"

tree-sitter-stack-graphs-python clean --all
tree-sitter-stack-graphs-python index $DIR
tree-sitter-stack-graphs-python status $DIR
tree-sitter-stack-graphs-python visualize $DIR

I get:

/Users/nohehf/tmp/py/main.py: indexed
/Users/nohehf/tmp/py/module.py: indexed
Visualization at stack-graph.html

Now if I query main.py on the foo reference (main.py:3:14), it does not find the definition in module.py:

tree-sitter-stack-graphs-python query definition "main.py:3:14"

/Users/nohehf/tmp/py/main.py:3:14: found 0 definitions for 1 references
queried reference
/Users/nohehf/tmp/py/main.py:3:14:
3 | baz = module.foo
  |              ^^^

has no definitions

However, trying to reproduce the behaviour in a test (test.py) I get no issues:

#------ path: module.py ------#

foo = "bar"

#------ path: main.py ------#

import module

baz = module.foo
#            ^ defined: 3
tree-sitter-stack-graphs-python test test.py
test.py: success

Am I missing something with the cli ? Are tests really behaving like those are separate files ?

I'm sure this is supposed to work and I have trouble understanding what's going on in the graph visualization: image

Thank you

nohehf commented 5 months ago

I just tried in javascript with a similar project structure, and it works just fine:

.
├── index.js
└── module.js
// index.js
import { foo } from "./module"

const baz = foo
// module.js
export const foo = "bar"
REPO="/Users/nohehf/tmp/js"

tree-sitter-stack-graphs-javascript clean --all
tree-sitter-stack-graphs-javascript index $REPO
tree-sitter-stack-graphs-javascript status $REPO
/Users/nohehf/tmp/js/index.js: indexed
/Users/nohehf/tmp/js/module.js: indexed
tree-sitter-stack-graphs-javascript query definition "index.js:3:13"
/Users/nohehf/tmp/js/index.js:3:13: found 2 definitions for 1 references
queried reference
/Users/nohehf/tmp/js/index.js:3:13:
3 | const baz = foo
  |             ^^^

has 2 definitions
/Users/nohehf/tmp/js/index.js:1:10:
1 | import { foo } from "./module"
  |          ^^^
/Users/nohehf/tmp/js/module.js:1:14:
1 | export const foo = "bar"
  |              ^^^
hendrikvanantwerpen commented 5 months ago

From what I see, the import is modeled as a push (with a downward arrow) of module, but the module access requires pops (the upward arrows) of the full path. I don't quite remember if this is on purpose or an oversight in the rules.

I wonder what happens if you run the index command in the directory itself? I think there's some logic missing that takes the source root into account when computing the module names.

nohehf commented 5 months ago

Oh I see what you mean. I think I'm starting to understand the visualization, you are referring about (1) and (2) right? CleanShot 2024-04-30 at 14 28 40

So I tried running it with both the full path:

tree-sitter-stack-graphs-python clean --all
tree-sitter-stack-graphs-python index /Users/nohehf/tmp/py
tree-sitter-stack-graphs-python query definition "/Users/nohehf/tmp/py/main.py:3:14"
/Users/nohehf/tmp/py/main.py:3:14: found 0 definitions for 1 references
queried reference
/Users/nohehf/tmp/py/main.py:3:14:
3 | baz = module.foo
  |              ^^^

has no definitions

And just the current directory (.)

tree-sitter-stack-graphs-python clean --all
tree-sitter-stack-graphs-python index .
tree-sitter-stack-graphs-python query definition "main.py:3:14"
/Users/nohehf/tmp/py/main.py:3:14: found 0 definitions for 1 references
queried reference
/Users/nohehf/tmp/py/main.py:3:14:
3 | baz = module.foo
  |              ^^^

has no definitions

And visualisations are exactly the same.

nohehf commented 5 months ago

Hello @hendrikvanantwerpen, do you have any idea on this one ? I checked a bit into the code but I'm not sure where the issue comes from to be honest. I 100% agree that there is an issue with absolute / relative paths here but I don't really know how to fix it / get around it. Let me know if / how I can help. Thanks

hendrikvanantwerpen commented 5 months ago

@nohehf I don't immediately know what might be going on here. One quick experiment you could do is to replicate your test case in TypeScript, and see how the graphs differ. This will give us a starting point for what to look at.

nohehf commented 5 months ago

This is the JavaScript reproduction: https://github.com/github/stack-graphs/issues/430#issuecomment-2084883280 Here is the visualization: CleanShot 2024-05-06 at 17 43 53@2x

Here is an archive with the source code and the .html visualization. js.zip

We can see that in the visualization we do have the right push & pop on the full path.

nohehf commented 4 months ago

Getting back to this as I tried to reproduce the error in tests without success. The following test passes:

# ------ path: module.py -----------#

foo = 42

# ------ path: main.py -------------#

from module import foo

print(foo)
#     ^ defined: 3, 7

yet reproducing the same hierarchy in a separate dir does not, when indexed and queried through the cli (the 3 ref is not found).

I added an output of the visualization from the test and got: image

As opposed to the one from the "manual" cli testing: CleanShot 2024-05-27 at 16 05 29@2x

We can see that the test one does not prefix with all the parts from the absolute path. Maybe the "base path" of the codebase should be stripped out from the cli graph construction ?

I cannot really figure out if the issue here is the tsg or the cli/indexer

EDIT: it actually does work via the cli if the full path is specified (but it's not valid in python)

from Users.nohehf.tmp.repro.module import foo

print(foo)
#     ^ defined: 3, 7

CleanShot 2024-05-27 at 18 59 46@2x