github / licensed

A Ruby gem to cache and verify the licenses of dependencies
MIT License
986 stars 122 forks source link

Unable to get license information of dependent packages when using yarn workspaces #349

Open ledsun opened 3 years ago

ledsun commented 3 years ago

I am using two workspaces. And each workspace depends on the same package with different versions. In this case, the package is installed under the workspace directory.

This is the output of the yarn list:

~ yarn list
yarn list v1.15.2
├─ cross-env@5.0.5
│  ├─ cross-spawn@^5.1.0
│  └─ is-windows@^1.0.0
├─ cross-spawn@5.1.0
│  ├─ lru-cache@^4.0.1
│  ├─ shebang-command@^1.2.0
│  └─ which@^1.2.9
├─ is-windows@1.0.2
├─ isexe@2.0.0
├─ lru-cache@4.1.5
│  ├─ pseudomap@^1.0.2
│  └─ yallist@^2.1.2
├─ path-key@3.1.1
├─ pseudomap@1.0.2
├─ shebang-command@1.2.0
│  └─ shebang-regex@^1.0.0
├─ shebang-regex@1.0.0
├─ which@1.3.1
│  └─ isexe@^2.0.0
├─ workspace-a@1.0.0
│  ├─ cross-env@7.0.3
│  ├─ cross-env@7.0.3
│  │  └─ cross-spawn@^7.0.1
│  ├─ cross-spawn@7.0.3
│  │  ├─ path-key@^3.1.0
│  │  ├─ shebang-command@^2.0.0
│  │  └─ which@^2.0.1
│  ├─ shebang-command@2.0.0
│  │  └─ shebang-regex@^3.0.0
│  ├─ shebang-regex@3.0.0
│  └─ which@2.0.2
│     └─ isexe@^2.0.0
├─ workspace-b@1.0.0
│  ├─ cross-env@5.0.5
│  └─ workspace-a@1.0.0
└─ yallist@2.1.2
✨  Done in 0.08s.

In this case, licensed is unable to get the license information of the packages installed under the workspace directory. The following is the output of the licensed cache:


~ licensed cache
Caching dependency records for licensed-with-yarn-warkspaces
  yarn
    Using cross-env-5.0.5 (5.0.5)
    Error cross-env-7.0.3 (7.0.3)
    Using cross-spawn-5.1.0 (5.1.0)
    Error cross-spawn-7.0.3 (7.0.3)
    Using is-windows (1.0.2)
    Using isexe (2.0.0)
    Using lru-cache (4.1.5)
    Using path-key (3.1.1)
    Using pseudomap (1.0.2)
    Using shebang-command-1.2.0 (1.2.0)
    Error shebang-command-2.0.0 (2.0.0)
    Using shebang-regex-1.0.0 (1.0.0)
    Error shebang-regex-3.0.0 (3.0.0)
    Using which-1.3.1 (1.3.1)
    Error which-2.0.2 (2.0.2)
    Error workspace-a (1.0.0)
    Error workspace-b (1.0.0)
    Using yallist (2.1.2)

  * Errors:
    * licensed-with-yarn-warkspaces.yarn.cross-env-7.0.3
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.cross-spawn-7.0.3
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.shebang-command-2.0.0
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.shebang-regex-3.0.0
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.which-2.0.2
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.workspace-a
      - dependency path not found

    * licensed-with-yarn-warkspaces.yarn.workspace-b
      - dependency path not found
      - ```
jonabc commented 3 years ago

@ledsun thanks for the report! I'll take a look when I'm able - is this an urgent issue for you?

ledsun commented 3 years ago

I'm in no hurry. I have a workaround.

I assume that the cause of this issue is that yarn workspaces creates symbolic links in the node_modules directory.

~ ls -lah node_modules/
total 8
drwxr-xr-x  17 shigerunakajima  staff   544B  3 30 18:06 .
drwxr-xr-x   8 shigerunakajima  staff   256B  3 30 17:39 ..
drwxr-xr-x   5 shigerunakajima  staff   160B  3 30 18:06 .bin
-rw-r--r--   1 shigerunakajima  staff   2.4K  3 30 18:06 .yarn-integrity
drwxr-xr-x   7 shigerunakajima  staff   224B  3 30 18:06 cross-env
drwxr-xr-x   9 shigerunakajima  staff   288B  3 30 18:06 cross-spawn
drwxr-xr-x   6 shigerunakajima  staff   192B  3 30 18:06 is-windows
drwxr-xr-x  10 shigerunakajima  staff   320B  3 30 17:37 isexe
drwxr-xr-x   6 shigerunakajima  staff   192B  3 30 18:06 lru-cache
drwxr-xr-x   7 shigerunakajima  staff   224B  3 30 17:45 path-key
drwxr-xr-x   8 shigerunakajima  staff   256B  3 30 18:06 pseudomap
drwxr-xr-x   6 shigerunakajima  staff   192B  3 30 18:06 shebang-command
drwxr-xr-x   6 shigerunakajima  staff   192B  3 30 18:06 shebang-regex
drwxr-xr-x   8 shigerunakajima  staff   256B  3 30 18:06 which
lrwxr-xr-x   1 shigerunakajima  staff    14B  3 30 18:06 workspace-a -> ../workspace-a
lrwxr-xr-x   1 shigerunakajima  staff    14B  3 30 18:06 workspace-b -> ../workspace-b
drwxr-xr-x   7 shigerunakajima  staff   224B  3 30 18:06 yallist

I think licensed is getting the directory under node_modules in the next line: https://github.com/github/licensed/blob/2.15.1/lib/licensed/sources/yarn.rb#L79 But Dir.glob(config.pwd.join("node_modules/**/package.json")) does not follow symbolic links.

My workaround is a patch to replace this with Dir.glob(config.pwd.join('node_modules/**{,/*/**}/package.json')).

jonabc commented 3 years ago

@ledsun

I assume that the cause of this issue is that yarn workspaces creates symbolic links in the node_modules directory. My workaround is a patch to replace this with Dir.glob(config.pwd.join('node_modules/*{,//**}/package.json')).

👍 that makes sense. That would be an easy patch to put in but I'd like to learn more about how you'd expect licensed to work with them before adding that as a general fix.

I'm also not super up to speed on yarn workspaces themselves. I've read a few docs on them and they appear to be a way to manage dependencies among multiple projects/packages in a monorepo setup when one project takes a dependency on another project in the repo... is that correct?

I have a concern that a package with the same name and version but different licensing or other metadata might exist in different locations under node_modules which would make the logic for the yarn source indeterminate. Whichever duplicate name-version package is parsed last would be used for all projects. I'm not sure how realistic this scenario is but I think it might be possible if a package is forked and/or multiple package registries are used.

Can you set up a reproduction environment for the problem, removing any sensitive and non-public content? I'm wondering whether building knowledge about workspaces into the yarn source would be more appropriate. Something like enumerating workspaces, then changing dependency_paths to make use of them like Dir.glob(config.pwd.join("node_modules/**{,/${workspace}/**}/package.json"))

ledsun commented 3 years ago

Sorry. I can't provide a reproducible environment.

jgeurts commented 3 years ago

I'm running into this as well, specifically with package paths specified in nohoist. Here's a reproduction repo and licensed results.

Site note: I didn't want to store cached licenses in my repo, so each push runs licensed cache and licensed status. Hoping that's expected usage when not storing cached files.

jgeurts commented 3 years ago

@jonabc Please let me know if that helps or if I can provide anything else related to this!

jonabc commented 3 years ago

@jgeurts :wave: sorry I've been pretty busy and haven't had a chance to take a look. I'll try to get to it within the next 24 hrs

jonabc commented 3 years ago

@jguerts thanks for the example repo. the repo along with some additional yarn documentation for the nohoist syntax helped me wrap my head around the problem. This is an interesting problem space in that the workspaces functionality is solving for monorepos with multiple built packages. If each of these packages are distributed and used separately they should probably be tracked as individual "apps" from licensed's POV. I'm thinking the best way to handle this is

  1. consider workspaces as separate apps
    • the easiest way to handle this in licensed is to force users to specify their workspaces glob patterns in the licensed config file, however that can cause issues if .licensed.yml and package.json get out of sync. automagically handling workspaces detected from package.json will require some refactoring of how licensed apps are detected. I'll have to think on this one a bit more and see if there's an ideal solution
  2. update the yarn source's handling of node_modules to look at both the root node_modules directory and the workspaces node_modules directory. I don't know if adjusting the node_modules search path is a good idea in this case because it's possible to find the same dependency, at the same version, in two different locations and the metadata should know how to reference the correct one.

Off the top of my head this sounds like a fairly significant chunk of work, and as I mentioned things are pretty busy for me at the moment. Is this blocking you from using licensed and something you need fixed ASAP?

jgeurts commented 3 years ago

Thank you for looking into this!

It’s a blocker for me, but I’ll try working around it by removing the nohoist part of package.json before installing npms and running licensed. That should force all npms to install to the root. I’ve been holding off on that in case this was a relatively easy/quick update to licensed.

jgeurts commented 3 years ago

I tried removing the nohoist part of the package.json file, but licensed failed to get paths for a couple dependencies still. So, I ended up "manually" doing your suggestion from #​1:

That mostly worked, but I had some troubles with licenses for some of the dependencies not coming through accurately. I'll open a separate issue for that, if I can get it reproducing.

jonabc commented 3 years ago

@jgeurts FYI I've started looking at a fix. There is some complexity in the solution but I'm hopeful to have this fixed and able to put out a release in the next few days.

jonabc commented 3 years ago

@jgeurts I've hit a wall in how to get useful information out of yarn's CLI that perhaps you might be more knowledgable on?

When I call yarn list from a package defined as a yarn workspace, it still gives full information on dependencies used by everything in the repo, and not information specific to that workspace. That means that licensed has no way to distinguish which dependencies are specific to that workspace package or not. That is pretty consequential if someone is publishing individual packages out of a monorepo - it would not be correct to say that every dependency from the repo applies to all packages.

Even worse is that, at least with the version of yarn I have locally (1.22.10), yarn's hoisting mechanism is making packages available to workspaces that don't call them out as dependencies 😢 . I am able to use packages listed as a dependencies of workspace A inside of files in workspace B without any direct dependency linkage of workspace A -> B.

TBH I'm not sure how to support yarn workspaces given that I can't accurately tell what dependencies a project uses. For a tool that's purpose relates to licensing compliance I tend to err towards the safer option and am inclined to explicitly deny support for workspaces. I'll try with yarn v2 tomorrow and see if maybe they've fixed things in later versions.

If you have any ideas on how to support this I'd be very interested in hearing them 😆

villelahdenvuo commented 2 years ago

@jonabc This seems to also be an issue with npm workspaces:

Project setup ```jsonc // ./package.json { "name": "@grano/mygrano-serverless", "version": "1.0.0", "description": "My Grano Serverless", "private": true, "workspaces": [ "serverless-utils", "cognito", "antivirus", "ecom-cloudfront", "iam-cloudfront", "iam-ses-notifications", "ses-events", "ses" ] } // ./antivirus/package.json { "name": "@grano/antivirus", "version": "1.0.0", "description": "Virus Check Lambda for Widdix antivirus", "private": true, "dependencies": { "@grano/logger": "^2.12.1", "callbackify": "^1.1.0", "dotenv": "^16.0.0", "es6-promisify": "^7.0.0", "lodash": "^4.17.21", "superagent": "^7.1.1" } } ``` ```yml # licensed.yml apps: - name: antivirus source_path: "/home/runner/work/mygrano-serverless/mygrano-serverless/antivirus" cache_path: "/home/runner/work/mygrano-serverless/mygrano-serverless/.licenses/antivirus" sources: - name: antivirus.npm ```

Results:

# npm install --ignore-scripts --no-audit --production
added 65 packages in 3s
# npm ls callbackify
@grano/mygrano-serverless@1.0.0 /Users/ville.lahdenvuo/Documents/Grano/repot/mygrano-serverless
└─┬ @grano/antivirus@1.0.0 -> ./antivirus
  └── callbackify@1.1.0

...

  Caching dependency records for antivirus
    * Errors:
      * antivirus.npm.callbackify
        - missing: callbackify@^1.1.0, required by @grano/antivirus@1.0.0

      * antivirus.npm.dotenv
        - missing: dotenv@^16.0.0, required by @grano/antivirus@1.0.0

      * antivirus.npm.es6-promisify
        - missing: es6-promisify@^7.0.0, required by @grano/antivirus@1.0.0

      * antivirus.npm.superagent
        - missing: superagent@^7.1.1, required by @grano/antivirus@1.0.0

I assume it's because npm de-dupes dependencies in the root node_modules folder, but licensed does not traverse into parent folders.

villelahdenvuo commented 2 years ago

For people having problems with npm workspaces I added this script before running licensed to work around the issue:

  echo "::group::Workspaces workaround"
  WORKSPACES=$(jq -r '.workspaces // [] | join(" ")' package.json)
  if [[ -n "$WORKSPACES" ]]; then
    for package in $WORKSPACES; do
      echo "Copying node_modules to $package"
      rsync -r --exclude ".bin" --exclude "@my-org/*" ./node_modules/ "./$package/node_modules"
    done
  fi
  echo "::endgroup::"
jonabc commented 2 years ago

thanks for the repro - taking a look!

jonabc commented 2 years ago

@villelahdenvuo in this case, the errors being reported by licensed are actually coming from npm. I set up a quick repro for the above setup and got the same errors running npm list from the individual workspace.

There's an issue that was opened on the npm cli about this where npm ls run from a workspace isn't aware that it's dependencies are installed elsewhere. It looks like the latest versions of npm (8.5.0+) enable automatic workspace root detection which will solve this with a little extra logic in licensed to strip the current workspace from the list of detected dependencies

Are you able to upgrade to using npm 8.5+? I'd like to make that a requirement if possible for using npm workspaces because it allows the closest parity to existing licensed usage without requiring any additional configuration options.

villelahdenvuo commented 2 years ago

@jonabc Yeah I'm already using 8.5+ locally, I just need to make sure it's being used on the Github Actions also. Thanks for the investigation!