Thinkmill / manypkg

☔️ An umbrella for your monorepo
MIT License
872 stars 48 forks source link

Create manypkg/tools and refactor find-root and get-packages #151

Closed elliot-nelson closed 1 year ago

elliot-nelson commented 1 year ago

SUMMARY

Refactor monorepo tools and associated types into a new core package, that can be used by both get-packages and find-root.

This initial PR is incomplete and I've put it up to gather initial feedback on the approach; if feedback is positive I'll continue to iterate on it. I've currently included a possible API and two tool implementations, Lerna and Rush.

DETAILS

Today, the process of "adding a tool" to the list of supported tools in manypkg is muddy, because the logic is spread around several packages. Goals:

TODO

Next steps / planned approach:

changeset-bot[bot] commented 1 year ago

🦋 Changeset detected

Latest commit: fedb05be4b605ba557490f040e39c1de5c810bc4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages | Name | Type | | --------------------- | ----- | | @manypkg/find-root | Major | | @manypkg/tools | Minor | | @manypkg/get-packages | Major | | @manypkg/cli | Patch |

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

elliot-nelson commented 1 year ago

Thanks @Andarist, tons of good feedback here!

To summarize:

Andarist commented 1 year ago

I will split apart the replacement of the glob engine, to match changesets, into a separate PR that can go ahead of this one.

Note that I don't have a strong opinion about which library should be used by both - it's just that both should use the same one. I don't know about exact differences between globby and micromatch to make the decision right now (if they are almost identical then I would probably use the one that is used by Changesets since that's a more popular project than manypkg)

(will ignore for now -- minimatch is just the matching engine, which fast-glob is based on, and globby is a wrapper around fast-glob, so you are kind of all in a swirl here no matter which you pick)

Thanks for the analysis! This is helpful info in general - but Changesets are using micromatch and not minimatch and that is not using minimatch as its engine, if I read things correctly.

I will introduce a PackageJSON clone into this PR, but will make sure not to adjust it at all until after changesets switches to it (we can get by on structural typing in meantime).

Yeah, structural typing saves the day. I wouldn't even mind relying on it in the long term here since this type doesn't change ~often~ at all and we only care about very "standard" things (like name, private property, dependencies etc). AFAIK we don't extend it anywhere with any "custom" properties. It still might be a good idea to converge on a single type though - just saying that it's not really that important here.

elliot-nelson commented 1 year ago

Change summary:

elliot-nelson commented 1 year ago

Change summary:

elliot-nelson commented 1 year ago

@Andarist You brought up a new concern which I think makes sense to think about in the long term (not, necessarily, in this first stab): package manager variations.

Both LernaTool and RushTool support using NPM, PNPM or Yarn for your "publishing package manager".

Whereas, PnpmTool specifically uses PNPM.

If we imagine a future language-agnostic world, an AndroidTool or GradleTool might provide a dependency implementation (using build.gradle / gradle.properties files), but in order to publish, you can use "gradle publish" or your project might use "maven publish" directly.

If GradleTool did offer a choice between gradle and maven, and Changesets knew how to publish with either, the two publishing tools could require totally different command line options; it means Changesets would really need very specific package manager implementations. I wonder if the concept of package manager should be kept completely contained away from manypkg.

A related concern is that packages/cli/run.ts today just assumes you run commands with yarn; it might be nice for Tools to specify "commandLineTool", so that manypkg could run multi-package commands using lerna in a Lerna repo.

I think the question to answer is: if I have a Lerna repo, and I really want publishing to happen with (npm/pnpm/yarn), how does Changesets know which one to use? I feel like Changesets is the tool that needs to call it, so Changesets needs to decide which one, which means it is Changesets (not manypkg) that will read whatever config file might specify such an option, or invent a configuration option for changesets itself (changesets.publishingTool).

Let me know if that hangs together.

EDIT: I realized it kind of sounds like I'm making contradictory points above, so to clarify: I think that the "runner tool" (lernatool = lerna, bolttool = bolt, rushtool = rush) is part of manypkg's core, and it makes for "run.ts" to do the right thing for each tool. Whereas the concept of "publishingTool" is foreign to manypkg, it only exists for Changeset, and so a decision on what publishingTool to use for each Tool lives there instead.

elliot-nelson commented 1 year ago

@Andarist In a comment somewhere up above, you mentioned:

I'm unsure whether we should be abbreviating directory as dir in public APIs

This did get me thinking -- I think that the current API is rather ambiguous, not in terms of dir vs directory, but in terms of whether a given dir field is relative or absolute (and what it is relative to). Right now the whole structure returned from getPackages is still relative to the original cwd, which I think is not ideal.

This is my suggestion on how to fix this:

To give a concrete example, here's today's structure returned vs my proposed change. (In this example, we started the search from the "./packages" folder.)

Old:

{
  tool: "pnpm",
  packages: [
    { dir: "packages/a", packageJson: { name: "a", /* ... */ } },
    { dir: "packages/b", packageJson: { name: "b", /* ... */ } }
  ],
  root: { dir: "..", packageJson: { name: "myrepo", /* ... */ } }
}

New:

{
  tool: { type: "pnpm", /* ... */ },
  packages: [
    { relativeDir: "packages/a", packageJson: { name: "a", /* ... */ } },
    { relativeDir: "packages/b", packageJson: { name: "b", /* ... */ } }
  ],
  rootPackage: { relativeDir: ".", packageJson: { name: "myrepo", /* ... */ } },
  rootDir: "/Users/janjansen/dev/myrepo"
}

Let me know what you think.

Andarist commented 1 year ago

A related concern is that packages/cli/run.ts today just assumes you run commands with yarn;

Oh, I didn't realize this. It's definitely "bad" and we need to change this somehow. I never used manypkg to run scripts/commands, this was implemented in https://github.com/Thinkmill/manypkg/pull/42 but in the today's landscape... I question it's usefulness. We don't want to compete with Turborep, Nx and others here. We can keep it around as a lightweight solution for this problem but dunno, I would probably be fine with removing it.

I feel like Changesets is the tool that needs to call it, so Changesets needs to decide which one, which means it is Changesets (not manypkg) that will read whatever config file might specify such an option, or invent a configuration option for changesets itself (changesets.publishingTool).

I agree on principle that this is way more relevant for Changesets than it is for manypkg. OTOH, having yet another configuration/auto-selection thing in there (that should also be somewhat tightly-coupled with a Tool) doesn't sound like something fun to use. Either way - it's a problem for another day.


Re relativeDir and so on: 👍

Re abbreviations - to sum it up, you prefer the abbreviated dir over directory, right?

tool: { type: "pnpm", / ... / },

I think that you didn't describe this change but it's in the proposed new format.

rootPackage: { relativeDir: ".", packageJson: { name: "myrepo", / ... / } }, rootDir: "/Users/janjansen/dev/myrepo"

Alternative could look something like:

root: {
    dir: "/Users/janjansen/dev/myrepo",
    packageJson: { name: "myrepo", /* ... */ } // optional
}

But I guess that this would make it more annoying to work with as root wouldn't be of a Package type. Your proposal sounds better. I think I might have asked this somewhere... or maybe I just had this thought in the couple of last days. Do we even need to have a special rootPackage? Isn't this just a package? I can imagine that it might be a special package but I don't see any actual argument for it right now (one could be found further down the road so it still might be worth it to keep it separately). I can easily miss something obvious here though.

elliot-nelson commented 1 year ago

@Andarist

Re abbreviations - to sum it up, you prefer the abbreviated dir over directory, right?

Yeah, I don't see a need for the extra chars; I think clarifying rootDir vs relativeDir is more useful than the long name. I'd also accept rootFolder and relativeFolder as halfway in-between (shorter than Directory but not abbreviated 😆).

Do we even need to have a special rootPackage? Isn't this just a package?

If we take up the proposal above, and eliminated rootPackage, you could find the root package with .packages.find(p => p.relativeDir === ".").

However, I think separating rootPackage has value in repos that are actually monorepos, if the root package exists, it is a "dev dependency dumpster" -- it just exists to throw compilers and linters and CLI tools and libraries at, it's never intended to be published and therefore isn't actually part of your dependency graph.

Changeset's get-dependents-graph does include the root package, but I don't think that code is actually claiming "all the projects and the root project go in the graph" -- I think it's more like "well, we might ONLY have a root project, which is not a monorepo, so all the changesets will be for that one; or we might have a bunch of packages, in which case probably all the changesets are for those".

Andarist commented 1 year ago

Yeah, I don't see a need for the extra chars; I think clarifying rootDir vs relativeDir is more useful than the long name. I'd also accept rootFolder and relativeFolder as halfway in-between (shorter than Directory but not abbreviated 😆)

Let's roll with rootDir and relativeDir then 👍

However, I think separating rootPackage has value in repos that are actually monorepos, if the root package exists, it is a "dev dependency dumpster" -- it just exists to throw compilers and linters and CLI tools and libraries at, it's never intended to be published and therefore isn't actually part of your dependency graph.

Totally agree with that. Some tools are even requiring private: true in that root package.json. That being said - that package.json can refer to some local workspaces/packages so it should be the part of the dependency graph (as we might need to update the contained ranges there when versioning)

elliot-nelson commented 1 year ago

Latest batch of updates:

Outstanding conversations/open questions as of latest commit:

Andarist commented 1 year ago

Should Package provide relative dir, absolute dir, or both?

I don't have strong opinions about this. Relative dir is better for printing error messages, and absolute dir it better for writing back to those files/packages. Each has its own cons/pros and one can't be derived from the other one alone - we need the rootDir to derive them. So perhaps having both provide the best DX for our implementation etc at the expense of a harmless redundancy?

I will be fine either way though - pick our poison :p

elliot-nelson commented 1 year ago

Latest recap:

  1. I have added a .changeset entry for this PR, check it out and let me know if you think it is sufficient.

  2. I am suggesting a slightly different change for the try/catch guards -- I think although it's maybe another or two of code, this pattern is easier to read and easier to extend later (if, for example, you want to handle multiple different error codes). Let me know what you think:

  isMonorepoRootSync(directory: string): boolean {
    try {
      const lernaJson = fs.readJsonSync(
        path.join(directory, "lerna.json")
      ) as LernaJson;
      if (lernaJson.useWorkspaces !== true) {
        return true;
      }
    } catch (err) {
      if (err && (err as any).code === "ENOENT") {
        return false;
      }
      throw err;
    }
    return false;
  },

I think that's all that is left outstanding!

Andarist commented 1 year ago

I am suggesting a slightly different change for the try/catch guards -- I think although it's maybe another or two of code, this pattern is easier to read and easier to extend later (if, for example, you want to handle multiple different error codes). Let me know what you think:

Could you present this as a diff? I'm not completely sure what are we comparing here.

elliot-nelson commented 1 year ago

I am suggesting a slightly different change for the try/catch guards -- I think although it's maybe another or two of code, this pattern is easier to read and easier to extend later (if, for example, you want to handle multiple different error codes). Let me know what you think:

Could you present this as a diff? I'm not completely sure what are we comparing here.

@Andarist Ah, that was silly, I forgot to push the commit!

This is what I was referring to: https://github.com/Thinkmill/manypkg/pull/151/commits/b8ed304b861233011e4ce527d3ed67e6d055e39d#diff-b9a9e78d290f3682854230f477271a105f907e1cea680ac208672a42b6358fcdR50

    } catch (err) {
      if (err && (err as any).code === "ENOENT") {
        return false;
      }
      throw err;
    }
    return false;

My (lukewarm) stance is that it's more useful to have a bog-standard, easy to read (and copy) clause for handling error codes, than it is to minimize the number of total exit points. So in my proposal, you're duplicating the return false exits (you could exit due to ENOENT, or due to falling out of the other checks above), but, the err checks themselves are "normal".