antoine-coulon / skott

All-in-one devtool to automatically analyze, search and visualize project modules and dependencies from JavaScript, TypeScript (JSX/TSX) and Node.js (ES6, CommonJS)
MIT License
643 stars 25 forks source link

Feature Request: Allow path grouping to visualize relationships between monorepo packages as a whole #133

Open brmenchl opened 7 months ago

brmenchl commented 7 months ago

Summary

I'm working on a pretty large and complex monorepo. Not all of our imports are targeting the index file or types files of packages, and our dependency graph is pretty dense. It would be nice to be able to group all files within a package as a single "unit" so we could just visualize which packages import from each other and break down the high-level relationships.

I end up doing my own parsing of the graph output right now, and we use it to decide whether to combine packages, or whether to break a third package out of 2 circularly dependent packages to fix the relationship.

Details

I imagine the graph analyzing would be the same, but collect extra metadata about which bucket each file should go into. I guess this could either look like specifying package directories, and anything under that directory would fall into that package bucket, or key value pairs of package name to path globs to specify buckets.

Then you would have the ability to visualize the graph as groups (edges would be weighted by the number of imports, or not weighted and you could just toggle out of the view).

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question Answer
Would you consider contributing a PR? Yes I think I would need a bit of pairing help at first to get me situated in the repo but I would be happy to take a stab at this.
antoine-coulon commented 7 months ago

Hello,

This feature request reminds me #125 and #33 where one would be able to group a set of modules into a compacted set of nodes. Please check the issue and feel free to tell me if that maps to what you expect. Also, you can check this comment that already provides a way of dealing with that until we land it to the core, that is:

If I recap with some pseudo-code:

import skott from "skott";

import {
  continueResolution,
  DependencyResolver,
  DependencyResolverOptions,
  skipNextResolvers,
} from "skott/modules/resolvers/base-resolver";

// This is a DependencyResolver, `resolve` will be called by skott for each module declaration found in each file analyzed
export class DojoDependencyResolver implements DependencyResolver<any> {
  constructor(
    private readonly groupingPattern: {
      path: string;
      name: string;
    }[]
  ) {}

  // This is where you can hook up on the skott resolution algorithm. From there you can decide where a module declaration 
  // goes.
  async resolve({
    projectGraph,
    moduleDeclaration,
  }: DependencyResolverOptions<any>) {
    for (const ref of this.groupingPattern) {
     // whatever check that would allow you to know that module declaration comes from one group
      if (moduleDeclaration.startsWith(ref.path)) {
        projectGraph.mergeVertexBody(resolvedNodePath, (body) => {
          body.groupDependencies = (body.groupDependencies ?? []).concat(ref.name);
        });

        return skipNextResolvers();
      }
    }

    return continueResolution();
  }
}

const dojoGraph = await skott({
  // Use skott with the custom resolver, note that it can be combined with the default one if you still want to resolve other 
 // default dependencies
  dependencyResolvers: [
    new DojoDependencyResolver(
      {path: `some-folder`, name: `my-folder`}
    ),
  ],
});

const { graph } = dojoGraph.getStructure();

// from there you can use all the information to build the compacted version of the graph

const graphAlike = {
   "some-directory/index.ts": { id: "some-directory/index.ts", adjacentTo: [], body: { groupDependencies: ["some-other-directory"] } },
}

// You can then either use the key in the object to know where the node should be grouped
// and thanks to `groupDependencies` you could also know all imports to other groups from that group 

The only thing is that in that case you would be responsible for collecting first the workspace information that would allow you to do to provide enough information to be able to do that compacting (this is what I called groupingPattern there).

In the case where you want each group to be indeed a workspace project, I would need to refine the skott's API we have a getWorkspace function that allows to find all packages and their set of devDeps, prodDeps and peerDeps, but it's attached to a skott's instance. To be honest I didn't think of the case where you would want to execute that before running skott analysis, otherwise you end up running skott twice just to first get the workspace information and then re-run the analysis with it.

Visualization

Considering you made it to having a grouped version of the graph, you could use skott-webapp to render it like it was done in the Rush.js plugin.

Note that I'm aware all of this is pretty tedious to setup, but apart me (for the Rush plugin) no-one really went until there using the API in a raw way, most users out there just use the CLI so visualization modes were plugged from it, and API would be only used to work with JS scripts. But if it becomes a thing to use the API but then to also use visualization on top, then I could work on making it waaay easier to use skott JS API's and then exposing all the display modes (webapp, CLIs...) to be easily plugged with it.

Also, doing that type of things with graphs is exactly the purpose of skott so you're at the right spot, we just need to get there API-wise :-)

antoine-coulon commented 7 months ago

At the end of the day, even that feature in the core would do more or less the same which is generating the entire graph first, and then providing a compacted version.

So you could even get rid of the dependencyResolver custom things and just working out from the generated graph that would be executed at the root of your monorepo. It don't know how "dense" is your graph but note that it might not be super fast to compute it on heavy graphs, performance improvements still need to be done.

Nevertheless from the graph such as:


import skott from "skott";

const { graph } = await skott().then(({ getStructure }) => getStructure())

From that "graph" you can group it given your specific context. Once grouped, you can boot the web application using the same principle as shown in the previous comment.

From there the web app should display nodes and their edges (dependencies between groups if you updated adjacentTo property for each node).

Note also that one current limitation is that skott-webapp expect some files and was not updated to work with groups, so the File Explorer section might be at worst not working well or at best completely useless.

brmenchl commented 7 months ago

Yep, this request is basically exactly the same from what I'm reading in all those request you linked! I can close this if it's duplicate.

Would there be a benefit to doing the custom dependency resolver? From what I understand you're saying that it may not be that much faster to gather the group metadata during resolution time rather than afterwards

antoine-coulon commented 7 months ago

It's fine to keep these issues opened even if they are requesting more or less the same use cases because they describe different contexts, it should help us working towards the right API.

Would there be a benefit to doing the custom dependency resolver? [...]

The resolver is not really about speed (even though using custom resolver for a very specific use case could be faster as the default one tries to resolve many things that might not be useful), it's more about what you're trying to achieve.

DependencyResolver's responsibility is to loop over all module declarations from each traversed file and classify each of them (third-party, builtin, path aliases, etc). This is what we call dependencies of a module and will be reflected in the graph nodes and edges.

So in a context where you don't care about all that (classifying third-party, builtin, path aliases etc) and just want to check if a module declaration is coming from a specific group that is related to your use case, then you could add that information to the node at resolve time. It would allow you to avoid polluting graph with irrelevant data, for instance creating dependencies between nodes from the same group. Also, it would already gather additional information before the "grouping" step, as you would know that Node-A has a dependency to Group-B and having that information could be useful in a way that would allow you to know which node is creating the Group-A -> Group-B dependency (considering Node-A comes from Group-A).

After that step, you would still need to do the grouping as a DependencyResolver does not change nodes structure but only their bodies (information attached to each node).

So basically you could simply loop over the entries of the graph and group by each node within its own group while also merging each nodes information (collected with the resolver) from the same group into the root node's group.

Besides that, you have a less efficient but more straightforward approach (API-wise) which is using the default resolver (EcmaScriptDependencyResolver) that will try to resolve much more dependencies but does not require any setup.

Then you can refine that graph to a lighter graph with groups, the graph is basically a Map so you just need to match each node's id and body with the paths you have defined in the Groups definition and you're done.

--

In both cases you would end up with a Graph that could be used in visualisation modes. Groups would need better visualisation support though

antoine-coulon commented 6 months ago

Hello @brmenchl, just for your own information, the first iteration of path grouping was introduced in 0.33.0 thanks to @AlexandrHoroshih in #146.

The groupBy function allows you to go through all node paths collected within the graph and to put them in a specific bucket.

For instance the following


skott({ 
   groupBy: (nodePath) => {
     if(nodePath.startsWith('package-a')) {
        return 'package-a'
     }
   } 
})

will generated a groupedGraph with one node package-a accessible through instance.getStructure() function.

As of now, there are still tasks to do to cover the whole use case:

In the next iteration, this is what I imagine:


// This will either be generated from pnpm workspace information or provided as a custom config
const pnpmWorkspaces = {
   "package-a": {
     customName: "my-package-a"
   },
  "package-b": {
    customName: "my-package-b"
  }
}

skott({ 
   groupBy: (nodePath) => {
     const pkg = Object.keys(pnpmWorkspaces).find((key) => path.startsWith(key));
     if(pkg) return pnpmWorkspaces[pkg].customName
   },
  dependencyResolvers: [new PnpmWorkspaceResolver(), new EcmaScriptDependencyResolver()]
})

That way, groups will allow you to recreate the workspace-level graph and also have the knowledge about all the workspace dependencies living between each workspace package.

brmenchl commented 6 months ago

wow that's great! this is exactly what i was looking for