Feature Request: Allow path grouping to visualize relationships between monorepo packages as a whole

Summary

I'm working on a pretty large and complex monorepo. Not all of our imports are targeting the index file or types files of packages, and our dependency graph is pretty dense. It would be nice to be able to group all files within a package as a single "unit" so we could just visualize which packages import from each other and break down the high-level relationships.

I end up doing my own parsing of the graph output right now, and we use it to decide whether to combine packages, or whether to break a third package out of 2 circularly dependent packages to fix the relationship.

Details

I imagine the graph analyzing would be the same, but collect extra metadata about which bucket each file should go into. I guess this could either look like specifying package directories, and anything under that directory would fall into that package bucket, or key value pairs of package name to path globs to specify buckets.

Then you would have the ability to visualize the graph as groups (edges would be weighted by the number of imports, or not weighted and you could just toggle out of the view).

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question	Answer
Would you consider contributing a PR?	Yes I think I would need a bit of pairing help at first to get me situated in the repo but I would be happy to take a stab at this.

Hello,

This feature request reminds me #125 and #33 where one would be able to group a set of modules into a compacted set of nodes. Please check the issue and feel free to tell me if that maps to what you expect. Also, you can check this comment that already provides a way of dealing with that until we land it to the core, that is:

Using a custom resolver that can be used to add custom properties to the graph. It can be used to filter out dependencies of one node, and to only keep the ones interesting you (module imports of your monorepo projects only and discard third-party or builtin imports for instance). It's basically what I did for Rush.js (monorepo tool) plugin I wrote.
Once you have the whole graph and each node has its custom properties (until there no grouping was done yet), you can then derive a more compact version of the graph. This is the part currently lacking, I believe @AlexandrHoroshih is working on integrating that into the core, but you can end up doing a pretty simple version of it that would work.

If I recap with some pseudo-code:

import skott from "skott";

import {
  continueResolution,
  DependencyResolver,
  DependencyResolverOptions,
  skipNextResolvers,
} from "skott/modules/resolvers/base-resolver";

// This is a DependencyResolver, `resolve` will be called by skott for each module declaration found in each file analyzed
export class DojoDependencyResolver implements DependencyResolver<any> {
  constructor(
    private readonly groupingPattern: {
      path: string;
      name: string;
    }[]
  ) {}

  // This is where you can hook up on the skott resolution algorithm. From there you can decide where a module declaration 
  // goes.
  async resolve({
    projectGraph,
    moduleDeclaration,
  }: DependencyResolverOptions<any>) {
    for (const ref of this.groupingPattern) {
     // whatever check that would allow you to know that module declaration comes from one group
      if (moduleDeclaration.startsWith(ref.path)) {
        projectGraph.mergeVertexBody(resolvedNodePath, (body) => {
          body.groupDependencies = (body.groupDependencies ?? []).concat(ref.name);
        });

        return skipNextResolvers();
      }
    }

    return continueResolution();
  }
}

const dojoGraph = await skott({
  // Use skott with the custom resolver, note that it can be combined with the default one if you still want to resolve other 
 // default dependencies
  dependencyResolvers: [
    new DojoDependencyResolver(
      {path: `some-folder`, name: `my-folder`}
    ),
  ],
});

const { graph } = dojoGraph.getStructure();

// from there you can use all the information to build the compacted version of the graph

const graphAlike = {
   "some-directory/index.ts": { id: "some-directory/index.ts", adjacentTo: [], body: { groupDependencies: ["some-other-directory"] } },
}

// You can then either use the key in the object to know where the node should be grouped
// and thanks to `groupDependencies` you could also know all imports to other groups from that group

The only thing is that in that case you would be responsible for collecting first the workspace information that would allow you to do to provide enough information to be able to do that compacting (this is what I called groupingPattern there).

In the case where you want each group to be indeed a workspace project, I would need to refine the skott's API we have a getWorkspace function that allows to find all packages and their set of devDeps, prodDeps and peerDeps, but it's attached to a skott's instance. To be honest I didn't think of the case where you would want to execute that before running skott analysis, otherwise you end up running skott twice just to first get the workspace information and then re-run the analysis with it.

Visualization

Considering you made it to having a grouped version of the graph, you could use skott-webapp to render it like it was done in the Rush.js plugin.

Note that I'm aware all of this is pretty tedious to setup, but apart me (for the Rush plugin) no-one really went until there using the API in a raw way, most users out there just use the CLI so visualization modes were plugged from it, and API would be only used to work with JS scripts. But if it becomes a thing to use the API but then to also use visualization on top, then I could work on making it waaay easier to use skott JS API's and then exposing all the display modes (webapp, CLIs...) to be easily plugged with it.

Also, doing that type of things with graphs is exactly the purpose of skott so you're at the right spot, we just need to get there API-wise :-)

At the end of the day, even that feature in the core would do more or less the same which is generating the entire graph first, and then providing a compacted version.

So you could even get rid of the dependencyResolver custom things and just working out from the generated graph that would be executed at the root of your monorepo. It don't know how "dense" is your graph but note that it might not be super fast to compute it on heavy graphs, performance improvements still need to be done.

Nevertheless from the graph such as:


import skott from "skott";

const { graph } = await skott().then(({ getStructure }) => getStructure())

From that "graph" you can group it given your specific context. Once grouped, you can boot the web application using the same principle as shown in the previous comment.

From there the web app should display nodes and their edges (dependencies between groups if you updated adjacentTo property for each node).

Note also that one current limitation is that skott-webapp expect some files and was not updated to work with groups, so the File Explorer section might be at worst not working well or at best completely useless.

Yep, this request is basically exactly the same from what I'm reading in all those request you linked! I can close this if it's duplicate.

Would there be a benefit to doing the custom dependency resolver? From what I understand you're saying that it may not be that much faster to gather the group metadata during resolution time rather than afterwards

It's fine to keep these issues opened even if they are requesting more or less the same use cases because they describe different contexts, it should help us working towards the right API.

Would there be a benefit to doing the custom dependency resolver? [...]

The resolver is not really about speed (even though using custom resolver for a very specific use case could be faster as the default one tries to resolve many things that might not be useful), it's more about what you're trying to achieve.

DependencyResolver's responsibility is to loop over all module declarations from each traversed file and classify each of them (third-party, builtin, path aliases, etc). This is what we call dependencies of a module and will be reflected in the graph nodes and edges.

So in a context where you don't care about all that (classifying third-party, builtin, path aliases etc) and just want to check if a module declaration is coming from a specific group that is related to your use case, then you could add that information to the node at resolve time. It would allow you to avoid polluting graph with irrelevant data, for instance creating dependencies between nodes from the same group. Also, it would already gather additional information before the "grouping" step, as you would know that Node-A has a dependency to Group-B and having that information could be useful in a way that would allow you to know which node is creating the Group-A -> Group-B dependency (considering Node-A comes from Group-A).

After that step, you would still need to do the grouping as a DependencyResolver does not change nodes structure but only their bodies (information attached to each node).

So basically you could simply loop over the entries of the graph and group by each node within its own group while also merging each nodes information (collected with the resolver) from the same group into the root node's group.

Besides that, you have a less efficient but more straightforward approach (API-wise) which is using the default resolver (EcmaScriptDependencyResolver) that will try to resolve much more dependencies but does not require any setup.

Then you can refine that graph to a lighter graph with groups, the graph is basically a Map so you just need to match each node's id and body with the paths you have defined in the Groups definition and you're done.

In both cases you would end up with a Graph that could be used in visualisation modes. Groups would need better visualisation support though

Hello @brmenchl, just for your own information, the first iteration of path grouping was introduced in 0.33.0 thanks to @AlexandrHoroshih in #146.

The groupBy function allows you to go through all node paths collected within the graph and to put them in a specific bucket.

For instance the following


skott({ 
   groupBy: (nodePath) => {
     if(nodePath.startsWith('package-a')) {
        return 'package-a'
     }
   } 
})

will generated a groupedGraph with one node package-a accessible through instance.getStructure() function.

As of now, there are still tasks to do to cover the whole use case:

given the groupBy option can only be provided using the API, we still need a way of invoking visualization modes from the API, as it's done using the CLI.
groupBy allows arbitrary nodes to be grouped, but given it's an ad-hoc method it isn't able to resolve workspaces dependencies. This is where we'll need to introduce a PnpmDependencyResolver in combination of the groupBy method, so that each group has the knowledge of dependencies that exist between each other. Without that custom resolver, groups will just be nodes with no workspaces dependencies between each other (other dependencies such as npm and builtin will still be resolved).

In the next iteration, this is what I imagine:


// This will either be generated from pnpm workspace information or provided as a custom config
const pnpmWorkspaces = {
   "package-a": {
     customName: "my-package-a"
   },
  "package-b": {
    customName: "my-package-b"
  }
}

skott({ 
   groupBy: (nodePath) => {
     const pkg = Object.keys(pnpmWorkspaces).find((key) => path.startsWith(key));
     if(pkg) return pnpmWorkspaces[pkg].customName
   },
  dependencyResolvers: [new PnpmWorkspaceResolver(), new EcmaScriptDependencyResolver()]
})

That way, groups will allow you to recreate the workspace-level graph and also have the knowledge about all the workspace dependencies living between each workspace package.

wow that's great! this is exactly what i was looking for

antoine-coulon / skott