github / licensed

A Ruby gem to cache and verify the licenses of dependencies
MIT License
985 stars 122 forks source link

Difference between ignored and reviewed #196

Closed sergey-alekseev closed 5 years ago

sergey-alekseev commented 5 years ago

Recently I got a question from my teammate:

What's the difference between reviewed and ignored in .licensed.yml?

I found only the following:

# These dependencies are explicitly ignored.
ignored:
  bundler:
    - some-internal-gem

  bower:
    - some-internal-package

# These dependencies have been reviewed.
reviewed:
  bundler:
    - bcrypt-ruby

  bower:
  - classlist # public domain
  - octicons

For me it's still not really clear when I should put a gem into ignored: and when into reviewed:. Can you guide me?

Probably this should be clarified in the docs or combined into a single category to avoid confusions and save time debating with colleagues.

mlinksva commented 5 years ago

They do the same thing (@jonabc will correct me if that's not the case) but the use I've seen is like the example implies: ignored for internal dependencies, reviewed for external dependencies that aren't automatically allowed per the allowed list and license detection.

Ideally/wishlist I suppose they could have some different additional tooling, eg check whether a dependency can be moved out of approved and tests still pass, guardrails to detect whether something has changed that might merit a new review, or to detect whether a dependency really is internal.

jonabc commented 5 years ago

@sergey-alekseev @mlinksva

reviewed is meant for dependencies that need to be cached and checked, but do not have a license found that matches the allowed configured licenses.

ignored is meant for dependencies that shouldn't be cached at all, and will not have cached metadata written to the repo.

Good question! I'll update the docs when I have a chance, though I'd also welcome a PR :smile:

sergey-alekseev commented 5 years ago

Thanks for the clarification! @jonabc
I copy-pasted your comments and made a PR to add the docs change to your to-do list.

sergey-alekseev commented 5 years ago

An interesting thing though. Chances are an external dependency doesn't have a license info (examples: https://github.com/iain/http_accept_language/issues/72, https://github.com/grosser/i18n_data/pull/47, https://github.com/grosser/sort_alphabetical/pull/23). If added to reviewed then licensed status fails with "missing license text". Hence they should be added to ignored. So I'm not sure whether "but do not have a license found that matches the allowed configured licenses." suits well for reviewed 🤔

jonabc commented 5 years ago

@sergey-alekseev what you described is intentional, though I'm not sure what the recommended remediation here is. @mlinksva any thoughts?

The goal is to have cached metadata for all external dependencies that are used by the shipped product. The ignored flag is to say that a dependency that's being found isn't shipped as part of the public product, or it's something built by the project owner and isn't an external dependency. An external dependency that's found but doesn't have license text shouldn't be set as ignored

mlinksva commented 5 years ago

I'd add (and recall having done this) some evidence for the reviewed status in the case no license text at all is extracted. In the case of the first example above, "Released under the MIT license" pulled from https://github.com/iain/http_accept_language/blob/74a6a244afa069154d08df0f5df373a6c7fc4fe2/README.md#with-bundler

Note https://github.com/github/licensed/issues/189 (depending on specific implementation) could make this situation less common: in the case of the first example above again, https://github.com/iain/http_accept_language/blob/74a6a244afa069154d08df0f5df373a6c7fc4fe2/http_accept_language.gemspec#L13 would be recorded by licensed along with any license texts found.

mlinksva commented 5 years ago

Aside: thank you @sergey-alekseev for making PRs (seen from your examples, also thanks for providing concrete examples!) to improve licensing documentation from projects where it is missing, silently saving time for developers using (as they should) license tooling (licensed or numerous others) who might otherwise need to ask for review of innocuous dependencies.

sergey-alekseev commented 5 years ago

@mlinksva I probably didn't put enough emphasis on "missing license text". To make sure we are on the same page first let me clarify. licensed cache fetches licenses well for those examples. What's missing in these examples is license text (e.g. from a LICENSE file). See below:

$ licensed status
Checking cached dependency records for project
..........F.F....................F...................
Errors:
* project.bundler.http_accept_language
  filename: project/.licenses/bundler/http_accept_language.dep.yml
    - missing license text

* project.bundler.i18n_data
  filename: project/.licenses/bundler/i18n_data.dep.yml
    - missing license text

* project.bundler.sort_alphabetical
  filename: project/.licenses/bundler/sort_alphabetical.dep.yml
    - missing license text

N dependencies checked, 3 errors found.

project/.licenses/bundler/http_accept_language.dep.yml:

---
name: http_accept_language
version: 2.1.1
type: bundler
summary: Find out which locale the user preferes by reading the languages they specified
  in their browser
homepage: https://github.com/iain/http_accept_language
license: mit
licenses: []
notices: []

project/.licenses/bundler/i18n_data.dep.yml:

---
name: i18n_data
version: 0.7.0
type: bundler
summary: country/language names and 2-letter-code pairs, in 85 languages
homepage: https://github.com/grosser/i18n_data
license: mit
licenses: []
notices: []

project/.licenses/bundler/sort_alphabetical.dep.yml:

---
name: sort_alphabetical
version: 1.1.0
type: bundler
summary: Sort UTF8 Strings alphabetical via Enumerable extension
homepage: https://github.com/grosser/sort_alphabetical
license: mit
licenses: []
notices: []

project/.licensed.yml:

sources:
  bundler: true

allowed:
  - mit

reviewed:
  bundler:
    - http_accept_language # no license file (see https://github.com/iain/http_accept_language/issues/72)
    - i18n_data # no license file (see https://github.com/grosser/i18n_data/pull/47)
    - sort_alphabetical # no license file (see https://github.com/grosser/sort_alphabetical/pull/23)

So if I understand correctly your proposal is to change project/.licenses/bundler/http_accept_language.dep.yml to the following:

name: http_accept_language
version: 2.1.1
type: bundler
summary: Find out which locale the user preferes by reading the languages they specified
  in their browser
homepage: https://github.com/iain/http_accept_language
license: mit
licenses:
- sources: README.md
  text: Released under the MIT license
notices: []

Am I correct?

Also I can fill in recently added license text for project/.licenses/bundler/i18n_data.dep.yml and project/.licenses/bundler/sort_alphabetical.dep.yml even though they didn't have that text at that version. Would that be fine in your view?

sergey-alekseev commented 5 years ago

Aside: there is still a room for Licensed improvement. For my project I had to change 36 .licenses/bundler/*.dep.yml files manually our of 171 dependencies (21%). 23 times I changed only license: other, 11 times only licenses: [] and 2 times both. Filling in some details manually is quite easy and quick, so I didn't want to spend more time exploring the source code to understand why it happens. However one interesting thing I noticed is Licensee fetching license well from a git source, but not locally:

2.5.5 :001 > require 'licensee'
# => true 
2.5.5 :002 > license = Licensee.license '.../gems/awesome_print-1.8.0'
# => #<Licensee::License key=other> 
2.5.5 :003 > Licensee.license 'https://github.com/awesome-print/awesome_print/tree/v1.8.0'
# => #<Licensee::License key=mit> 
sergey-alekseev commented 5 years ago

An external dependency that's found but doesn't have license text shouldn't be set as ignored

Does this mean that even though an external dependency has e.g. the MIT license but doesn't have license text you can't use it? At least licensed status will not succeed. I'd agree with this comment, but you know licensing requirements, so can you please clarify? @jonabc

I did a quick search to see whether you had any comments on that during the development, but looks like the requirement was added during the initial release of v1.0.0 in the first commit.

jonabc commented 5 years ago

Going to answer as many of these as I can 😄

So if I understand correctly your proposal is to change project/.licenses/bundler/http_accept_language.dep.yml to the following:

Any content licensee detects from a README is already included in cached output, so I don't think it's that.

What would change in this example is that we'd be showing the license specification picked up from http_accept_language.gempsec. Still working through the best way to show that, but as a possible example

name: http_accept_language
version: 2.1.1
type: bundler
summary: Find out which locale the user preferes by reading the languages they specified
  in their browser
homepage: https://github.com/iain/http_accept_language
license: mit
licenses:
- sources: http_accept_language.gemspec
  text: s.license     = "MIT"
notices: []

If my understanding is correct, this is a slippery slope for the reasons outline by licensee. If others with better understanding of OSS licensing requirements give a 👍 that representing this content as a license is ok I'm happy to make that change.

For my project I had to change 36 .licenses/bundler/*.dep.yml files manually our of 171 dependencies (21%). 23 times I changed only license: other, 11 times only licenses: [] and 2 times both

Yeah that's painful. Some feedback around the details of why you had so many adjustments needed would be great

That said, licensed won't overwrite the license field on dependency version updates if the (normalized) license contents are determined to be the same between the cached and new version. In our experience there is high likelihood that nothing changes re: license contents between versions of a dependency, so a majority of the pain in clarifying licenses is only experienced once.

However one interesting thing I noticed is Licensee fetching license well from a git source, but not locally

That's a better question to ask at Licensee directly, though in my experience I've found this has to do with gem not installing all the files that Licensee uses from the repo as part of the package.

Does this mean that even though an external dependency has e.g. the MIT license but doesn't have license text you can't use it? At least licensed status will not succeed. I'd agree with this comment, but you know licensing requirements, so can you please clarify?

This is the part where I get to say I'm not a lawyer and don't personally have an answer on what constitutes a binding agreement when using an OSS license. I don't know @grosser personally and couldn't tell you whether his comment is based in legal knowledge or personal opinion.

As I mentioned before, licensee does bring this up in case that helps?

I did a quick search to see whether you had any comments on that during the development, but looks like the requirement was added during the initial release of v1.0.0 in the first commit.

The requirement is older than my involvement in the product 😆 I asked awhile ago whether reviewed dependencies should still surface this error, and the answer was to keep the error in place. I don't recall whether the question+answer was pre-v1 release or not, but I couldn't find anything in the issue/PR repo history.

sergey-alekseev commented 5 years ago

Thanks for your answer @jonabc

Any content licensee detects from a README is already included in cached output, so I don't think it's that.

I believe my previous assumption was wrong if I understand your comment correctly. Then I'm not sure how to make licensed status green in this case. Probably with sources: http_accept_language.gemspec as you outlined. Can you please rephrase and clarify, so I can understand better what you mean?

Some feedback around the details of why you had so many adjustments needed would be great

Some examples added to https://github.com/sergey-alekseev/licensed-example. All of them can be fixed manually I believe. Some previous failures were fixed by https://github.com/github/licensed/pull/188 and no longer require manual entry.

so a majority of the pain in clarifying licenses is only experienced once.

Yes. So not really painful.

jonabc commented 5 years ago

I believe my previous assumption was wrong if I understand your comment correctly. Then I'm not sure how to make licensed status green in this case. Probably with sources: http_accept_language.gemspec as you outlined. Can you please rephrase and clarify, so I can understand better what you mean?

That sounds like the best option for now as the most straightforward fix. Hopefully as we figure out https://github.com/github/licensed/issues/189, much of this will just fix itself automatically. If you have any specific preference beyond "just make it work please", please do to comment on the issue 🙇 😄

Some examples added to https://github.com/sergey-alekseev/licensed-example

I forked your example 👍 I don't have enough time to dig in at the moment but I'd love to look through this and see exactly why each other classification occurred. It'd be great if we could better support some of these scenarios

sergey-alekseev commented 5 years ago

Awesome. Hopefully https://github.com/github/licensed/issues/189 will solve the problem.

jonabc commented 5 years ago

@sergey-alekseev thanks for your patience!

I just completed a quick audit of the example you provided. Here's my findings for things without license contents, and those marked other

No license contents

This class of failures is what we're trying to address in https://github.com/github/licensed/issues/189.

"other" license classification

The list here shows each dependency that was classified as "other", which files licensed found for each dependency, and the resulting classification from licensee for each file. In the case of other classifications for a file I also added a note on what I was able to tell on why a license wasn't detected

The majority of the failures here were due to deviations from an official license template for various reasons. I think some of them might be actionable to fix in the license detection logic in licensee, but many are likely to continue to be painful.

It's getting a little late in my day, but I'll open an issue tomorrow over in licensee and link to this + the example repo and see if there are any thoughts on improvements.

sergey-alekseev commented 5 years ago

@jonabc thanks for improving Licensed! Hope it'll be getting better and better with detecting licenses.

Just wanted to clarify how I fixed the first category of gems. Here is what I filled in manually for the gems with sources:

No license contents

aws-eventstream, has license file aws-partitions, has license file aws-sdk-core, has license file aws-sdk-kms, has license file aws-sdk-s3, has license file aws-sigv4, has license file bundler, has license file libv8, has license in README paper_trail, has license file rails-settings-cached, has license file

So all of them have license files actually.

jonabc commented 5 years ago

@sergey-alekseev I agree with you that the remote repositories have license files and/or licenses in the readme, but those files aren't included in the distributed gem - you can check each gemspec spec.files property to see what is included.

The only special case here is bundler, which I've found to kind've do its own thing and may or may not be directly installed with ruby. For example my local bundler installation doesn't have a README or LICENSE

➜  licensed-example git:(master) ✗ ls -l ~/.rbenv/versions/2.6.4/lib/ruby/gems/2.6.0/gems/bundler-1.17.2/
total 0
drwxr-xr-x  4 jonabc  staff  128 Oct  9 10:28 exe
➜  licensed-example git:(master) ✗ ls -l ~/.rbenv/versions/2.6.4/lib/ruby/2.6.0/bundler/
build_metadata.rb          deployment.rb              gem_helpers.rb             lockfile_generator.rb      ruby_dsl.rb                source.rb                  vendored_persistent.rb
capistrano.rb              deprecate.rb               gem_remote_fetcher.rb      lockfile_parser.rb         ruby_version.rb            source_list.rb             vendored_thor.rb
cli/                       dsl.rb                     gem_tasks.rb               match_platform.rb          rubygems_ext.rb            spec_set.rb                version.rb
cli.rb                     endpoint_specification.rb  gem_version_promoter.rb    mirror.rb                  rubygems_gem_installer.rb  ssl_certs/                 version_ranges.rb
compact_index_client/      env.rb                     gemdeps.rb                 plugin/                    rubygems_integration.rb    stub_specification.rb      vlad.rb
compact_index_client.rb    environment_preserver.rb   graph.rb                   plugin.rb                  runtime.rb                 templates/                 worker.rb
compatibility_guard.rb     errors.rb                  index.rb                   process_lock.rb            settings/                  ui/                        yaml_serializer.rb
constants.rb               feature_flag.rb            injector.rb                psyched_yaml.rb            settings.rb                ui.rb
current_ruby.rb            fetcher/                   inline.rb                  remote_specification.rb    setup.rb                   uri_credentials_filter.rb
definition.rb              fetcher.rb                 installer/                 resolver/                  shared_helpers.rb          vendor/
dep_proxy.rb               friendly_errors.rb         installer.rb               resolver.rb                similarity_detector.rb     vendored_fileutils.rb
dependency.rb              gem_helper.rb              lazy_specification.rb      retry.rb                   source/                    vendored_molinillo.rb
jonabc commented 5 years ago

I've updated https://github.com/jonabc/licensed-example/ to include the full content after bundle install --path vendor/gems . This is the "source of truth" that github/licensed uses to determine dependency data.

It's mostly complete but won't contain data on the ruby default gems like bundler and json

sergey-alekseev commented 5 years ago

Makes sense, I haven't thought about it. So Licensed can't fetch those license files from local directories as they simply do not exist there. They can still be grabbed from repositories and filled in manually then when necessary.

jonabc commented 5 years ago

@mlinksva @benbalter I've pared down https://github.com/jonabc/licensed-example/tree/master/vendor/gems/ruby/2.6.0/gems to only include the dependencies in the audit above.

I'm not expecting anything to be actionable for dependencies on the no license contents list, but for the others it's not immediately clear which cases should be detected or not. If anyone gets a chance could you take a look and see if there is anything in there you'd expect that licensee would already detect, or would be reasonable to detect with additional work?

🙇