glato / emerge

Emerge is a browser-based interactive codebase and dependency visualization tool for many different programming languages. It supports some basic code quality and graph metrics and provides a simple and intuitive way to explore and analyze a codebase by using graph structures.
MIT License
818 stars 49 forks source link

Different dependency name with `import io` and `from io import StringIO` #29

Closed and3rson closed 1 year ago

and3rson commented 1 year ago

Describe the bug It seems that the behavior of determining dependency names differs based on the import style used.

Example project:

# foobar/services/stuff1.py
import io
io.foo()

# foobar/services/stuff2.py
from io import StringIO
StringIO.bar()

This yields four nodes with the following dependencies (using graphviz syntax for demo purposes here):

digraph {
  stuff1 -> io
  stuff2 -> foobar/io
}

Logs:

2022-10-09 20:02:01     parser D ⏩ generating file results...
2022-10-09 20:02:01     parser D ⏩ extracting imports from file result stuff2.py...
2022-10-09 20:02:01     parser D ⏩ adding import: foobar/io
2022-10-09 20:02:01     parser D ⏩ generating file results...
2022-10-09 20:02:01     parser D ⏩ extracting imports from file result stuff1.py...
2022-10-09 20:02:01     parser D ⏩ adding import: io

I did some introspection on pyparser.py and it seems that:

Describe your environment

To Reproduce Steps to reproduce the behavior:

  1. Create project "foobar"
  2. Import module xxx in two different files using import xxx and from xxx import yyy foms
  3. Run the tool
  4. See xxx and foobar/xxx listed as two separate dependencies

Expected behavior In both cases, only xxx should be shown.

Screenshots image

glato commented 1 year ago

@and3rson Thank you for the detailed and very helpful description and steps to reproduce 👍. Will have a look at this issue in the following days and hopefully provide a fix.

glato commented 1 year ago

@and3rson The issue you found is pretty interesting in the sense that one can maybe reduce it to the problem of trying to decide (no matter of the import syntax) if you're dealing with an import from pythons own libraries. So my approach/fix does exactly this, in checking if the dependency can be found in sys.modules and if so - setting global_import to True. This seems to solve the reproducible example you've described above (see my screenshot). Also tried it on a bigger codebase (Django), as one would assume it reduces nodes and edges.

CleanShot 2022-10-19 at 20 23 29@2x

Your feedback would be great, you can find the quick fix in PR #30.