glato / emerge

Emerge is a browser-based interactive codebase and dependency visualization tool for many different programming languages. It supports some basic code quality and graph metrics and provides a simple and intuitive way to explore and analyze a codebase by using graph structures.
MIT License
783 stars 46 forks source link

Generate duplicate nodes for files #22

Open HickeyHsu opened 2 years ago

HickeyHsu commented 2 years ago

This is a very exciting tool. I try to use it to generate the FILE_RESULT_DEPENDENCY_GRAPH of Java project, and then found a problem. For each Java file, the tool will generate two nodes: 1) file node (with absolute_name) and 2) class node (only with display name)

for example, node<D:\idea_workspace\ACPG4J\src\main\java\analyser4J\graph\AbstractVertex.java> and node

In my opinion, they should be regarded as one single node. Wondering if there is a solution.

Again, this project is pretty amazing. Thanks!

glato commented 2 years ago

@HickeyHsu Thanks for the nice feedback 👍. This also relates to the issue #23 that you've described? If not - could you give me a detailed example (and maybe even post a small screenshot of the graph/issue)? Thanks for your help!

HickeyHsu commented 2 years ago

@HickeyHsu Thanks for the nice feedback 👍. This also relates to the issue #23 that you've described? If not - could you give me a detailed example (and maybe even post a small screenshot of the graph/issue)? Thanks for your help!

It's a more common problem than #23 . Almost every self-defined class would be duplicated. For example, I define a class Graphviz in file ACPG4J\src\main\java\analyser4J\util\Graphviz.java; and I import it in another fileACPG4J\src\main\java\module\cpg\graphs\cpg\CodePropertyGraph.java.

First node would be generated from files that defined the class, likes:

<node id="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\util\Graphviz.java">
      <data key="d0">D:\idea_workspace\ACPG4J\src\main\java\analyser4J\util\Graphviz.java</data>
      <data key="d1">Graphviz.java</data>
      <data key="d2">19</data>
      <data key="d3">156</data>
      <data key="d231">0.33919588761078556</data>
      <data key="d232">0.30083373993452384</data>
      <data key="d35">0.2775904223555037</data>
      <data key="d233">0.2612145985576437</data>
      <data key="d234">0.2612145985576437</data>
      <data key="d235">0.24112116789936341</data>
      <data key="d73">0.2089258235662492</data>
      <data key="d9">1</data>
      <data key="d10">0</data>
      <data key="d11">6</data>
    </node>

The second node would be generated from the import at another class, likes:

    <node id="analyser4J.util.Graphviz">
      <data key="d1">analyser4J.util.Graphviz</data>
      <data key="d233">0.4036928296883384</data>
      <data key="d35">0.2983585580188285</data>
      <data key="d234">0.3139833119798187</data>
      <data key="d283">0.26912855312555894</data>
      <data key="d284">0.24414695886471777</data>
      <data key="d285">0.22427379427129912</data>
      <data key="d286">0.22427379427129912</data>
      <data key="d9">9</data>
      <data key="d10">1</data>
      <data key="d11">6</data>
      <data key="d21">0</data>
      <data key="d22">0</data>
      <data key="d12">1</data>
      <data key="d13">6</data>
    </node>

with an edge:

<edge source="D:\idea_workspace\ACPG4J\src\main\java\module\cpg\graphs\cpg\CodePropertyGraph.java" target="analyser4J.util.Graphviz"

In fact, by looking at the edge collection, we can see that each edge starts with a file and ends with a class entity:

    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.astgen.finder.NodeFinderConfig" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.astgen.finder.NodeLocator" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.astgen.helpers.FilePosConverter" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.astgen.helpers.FileSystemHelpers" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.builder.PDGBuilder" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.builder.PDGBuilderConfig" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.builder.SlicedPDGBuilder" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.graph.PDG" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.graph.Vertex" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.graph.VertexType" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.slice.Slicer" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.slice.config.LineNumSliceConfig" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.slice.config.SliceConfig" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.util.DotGraphExporter" />
    <edge source="D:\idea_workspace\ACPG4J\src\main\java\analyser4J\PDGSlicing.java" target="analyser4J.util.Utils" />
......

I also tried to fix it by using Classpaths instead of absolute paths:

    def calculate_dependency_graph_from_results_file_merged(self, results: Dict[str, Any]) -> None:
        """Constructs a dependency graph from a list of abstract file results.
            merge same nodes
        Args:
            results (List[AbstractFileResult]): A list of objects that subclass AbstractFileResult.
        """
        LOGGER.debug('creating dependency graph...')
        result:FileResult
        for _, result in results.items():
            # node_name = result.unique_name
            absolute_name = result.absolute_name
            display_name = result.display_name
            node_name=result.module_name+"."+Path(display_name).stem
            self._digraph.add_node(node_name, absolute_name=absolute_name, display_name=display_name)
            dependencies = result.scanned_import_dependencies
            for dependency in dependencies:
                self._digraph.add_node(dependency, display_name=dependency)
                self._digraph.add_edge(node_name, dependency)