conda-forge / conda-forge.github.io

The conda-forge website.
https://conda-forge.org
BSD 3-Clause "New" or "Revised" License
128 stars 274 forks source link

Rust/Go packages license issues #1052

Open isuruf opened 4 years ago

isuruf commented 4 years ago

A typical rust package use dozens of packages which have different licenses and requirements. A rust package and its dependencies are usually compiled into one library or executable. For eg: https://github.com/conda-forge/staged-recipes/pull/11315 has a rust package with 91 dependencies with various MIT/BSD-3-Clause/Apache-2.0 licenses and maybe others.

This implies that the licenses and copyrights of the dependencies need to be distributed with the package. There are some tools to help do this like https://github.com/maghoff/cargo-license-hound, https://github.com/onur/cargo-license.

I'm opening this issue so that @conda-forge/staged-recipes and @conda-forge/core know about this when reviewing Rust recipes.

cc @andfoy, @mingwandroid

andfoy commented 4 years ago

What I'm doing in particular is using the JSON output information produced by cargo-license and then grab the repository urls across GitHub, BitBucket and GitLab to call their respective APIs to locate and download all the licenses. However, some libraries need a manual license download still.

nehaljwani commented 4 years ago

Doesn't the same concern apply to go packages?

dbast commented 4 years ago

To not re-invent the wheel here, how are other packaging eco systems solving that e.g. linux distributions like debian or homebrew?

isuruf commented 4 years ago

Yes, the same concern apply to Go packages. See also https://github.com/google/go-licenses

I've no idea how others fix this.

hadim commented 4 years ago

I am not sure how you want to address that but it does not seem straightforward. We could use a script that goes over all the dependencies, parse for the licenses, and list all the licenses per deps in the conda package?

hadim commented 4 years ago

Also at what level this script should be run? conda or conda-forge?

isuruf commented 4 years ago

@hadim, what @andfoy did for rust was to use a script to download licenses and put them in the recipe (and manually add licenses for packages that the script failed). He also added a check in build.sh to check that each dependency had a license file in the recipe. Same can be done for Go.

hadim commented 4 years ago

It makes sense.

That being said I probably don't have the bandwidth at the moment to do that for https://github.com/conda-forge/staged-recipes/pull/11799

isuruf commented 4 years ago

For go, it's simple. See https://github.com/google/go-licenses#complying-with-license-terms

SylvainCorlay commented 4 years ago

Quick thought, this also applies to C++ packages when you link statically with your dependencies.

nehaljwani commented 4 years ago

Should this be extended to header only dependencies as well? For example, if you use pybind11, boost, etc, do you need to package the license file used by them as well? Because that's as good as statically linking parts of them.

chrisburr commented 4 years ago

Perhaps there needs to be a licence_exports field in the conda build metadata.

isuruf commented 4 years ago

Should this be extended to header only dependencies as well?

Depends on the license.

For example, if you use pybind11, boost, etc, do you need to package the license file used by them as well?

pybind11: yes. boost: no.

bollwyvl commented 4 years ago

Thanks for the guidance here on this topic: texlab-feedstock is now using the same approach as pysyntect-feedstock, and "only" required manually hunting down 20 licenses (of 200+). Perhaps we should package cargo-license... seems to cost a couple minutes per build.

bollwyvl commented 3 years ago

As this has come up again for @conda-forge/cryptography:

I wonder if we start curating a community package, e.g. conda-forge-rust-licenses and conda-forge-go-licenses (or just lump them together under conda-forge-license-library) which has some automation to at least allow centralizing the list of known/used <thing>/<version>/(UN)LICEN(S|CE(-.*)(.(txt|md))? (oh and don't forget COPYRIGHT.*). Then packages can demand said package during builds, copying the assets from a well-known location to wherever there license_file points... now that we can use folders, that's much easier. If a new crate/mod shows up, the build would fail, but might suggest...

Some wild crates and mods approach!

- <crate>@<version> <url>
- <mod>@<version> <url>

From inspection, I've found the below licenses. Please visit the upstream repos and verify, then 
make a pull request to https://github.com/conda-forge/conda-forge-license-library adding the lines:

### recipe/licenses/cargo.txt

<repo>@<tag>/LICENSE-MIT
<repo>@<tag>/LICENSE-APACHE

### recipe/licenses/go-mod.txt

<repo>@<tag>/LICENSE-ZLIB-WITH-FREAKY-SPEC

this would in turn update the recipe (once) so we actually have the licenses sha256sums.

bollwyvl commented 3 years ago

So would a conda-incubator/* be the right path? I'm imagining a small (potentially single file) python package with a simple in-build CLI like cargo-licenses | dmv -o $SRC_DIR/third-party-licenses. The JSON/CSV file with, at the very least, the couple hundred licenses URLs/SHAs, would then live in the feedstock... but could contain the actual licenses texts themselves.

sstadick commented 3 years ago

Hello! I've been working on a tool to hopefully mitigate this issue / make it less painful to publish rust tools on conda-forge. It can be found here.

In short, it crawls the package dependencies and searches out the license files that correspond to what is in the Cargo.toml. If a license isn't found or looks suspicious it will write a warning message. It also provides a "check" flag that takes a previous version of a THRIDPARTY file and compares that against the new one, failing if they are different.

The idea is that the workflow would go as follows:

  1. Run cargo bundle licenses once, address all warnings by manually finding licenses where needed and copy-pasting them into the generated file. CHeck that file into version control and include it your manifest.
  2. Include cargo bundle licenses --output CI-THIRDPARTY --previous THIRDPARTY --check-previous in your CI. This will carry forward any manually changed entries for you, then do a whole file check for sameness, so if a version changed it would fail and force you back to step 1.

Currently this tool supports three formats: yaml, json, and toml. See the above repo for an example yaml THIRDPARTY file.

In the view of conda-forge maintainers, would this satisfy the requirement of licenses and copyrights of the dependencies need to be distributed with the package?

bollwyvl commented 3 years ago

Looks good! Really anything that moves things forward sounds great to me... I'm wagering if:

... I don't see what complaints there would/could be.

From a KISS perspective, and as I don't really want to hand edit this file, I'd see JSON being the preferable serialization format... to that end, now that SPDX 2.2.1 is ISO5962, I'd really hope we start seeing it adopted more broadly (and provided by upstream packagers) and can stop needing to re-implement clever stopgaps.

sstadick commented 3 years ago

@bollwyvl, thanks for the feedback!

Here is a PR for adding cargo-bundle-licenses to staged-recipes. To be clear, this would supersede cargo-license. The soul purpose of this tool is to satisfy the requirements of conda-forge packaging and make it less onerous to publish rust packages here.

I have two PR's dogfooding it right now: https://github.com/conda-forge/staged-recipes/pull/16110 and https://github.com/conda-forge/staged-recipes/pull/16111, I'll update them to pull in cargo-bundle-licenses via build requirements once / if the cargo-bundle-licenses PR can be merged.

bollwyvl commented 3 years ago

That's great progress! Good luck! Once again, I'd prioritize the initial staged-recipes PR for the tool itself, and then ensure it meets the needs of at least one known-important, but presently hand-curated, package, as they are the most likely to have been reviewed. Ensuing new packages will then be an easier pitch, as we'll be more confident.

By the by: I can't merge anything, don't really do rust (or go) dev, and am actually super constrained on community time right now anyway, so really I'm just selfishly looking forward to having some tools like this to ease my personal maintenance burden. God- (or -spirit-or-priniciple-or-animus-or-whatever-) speed!

sstadick commented 3 years ago

@bollwyvl I appreciate the guidance on this!

pkgw commented 3 years ago

Thanks @sstadick! I merged the tool recipe.

sstadick commented 3 years ago

Both https://github.com/conda-forge/staged-recipes/pull/16110 and https://github.com/conda-forge/staged-recipes/pull/16111 are now using the conda-forge cargo-bundle-licenses package to check that all thirdparty licenses are present.

sstadick commented 3 years ago

PR Adding cargo-bundle-licenses to ripgrep-feedstock https://github.com/conda-forge/ripgrep-feedstock/pull/17

isuruf commented 3 years ago

Looks good to me. What happens when cargo-bundle-licenses can't find a license/copyright for a package?

sstadick commented 3 years ago

If run without --check-previous it will just write a warning say it couldn't find the license, and then in the THIRDPARTY.yml file it will put NOT FOUND for the license text, the idea being that a user would then go find it and manually add it so that the next time you run it with --previous it will pull the manually found license forward for you if it still can't find it.

If running with --check-previous, as in the PR's above / in CI generally, if something is still NOT FOUND or different than the --previous license set the tool will exit 1 and fail to get someone's attention. Hopefully this means the THIRDPARTY file will actually stay up to date as deps change instead of making it once and forgetting it.

isuruf commented 3 years ago

Perfect. Thanks for working on this

sstadick commented 3 years ago

If this is good to go I'd love to get these two PR's merged: conda-forge/staged-recipes#16110 and conda-forge/staged-recipes#16111.

I'm sure there will be rough edges with cargo-bundle-licenses, I'm more than happy to resolve issues as they come up / help Rust packages get into conda.

kellpossible commented 3 years ago

@sstadick thanks for the great tool! I've also used it in https://github.com/conda-forge/staged-recipes/pull/16252 I can see it being useful in other projects too.

jakirkham commented 3 years ago

Thanks @sstadick! 😄

It would be great to integrate this strategy into grayskull, which we use to create/update recipes

sstadick commented 2 years ago

@jakirkham I agree! I think it's worth waiting a bit to see where the rough edges are in the cargo bundle-licenses workflow first. But it would be nice to have rust-project template.

BastianZim commented 2 years ago

This came up again recently for go and I was wondering if we shouldn't recommend the same approach here as for cargo-bundle-licenses.

As mentioned above, go-licenses is the recommended tool to collect these licenses so how about adding this to the build step and then adding the output to the license_file list?

Something like

build:
   number: 0
   script:
     - go-licenses save "github.com/google/trillian/server/trillian_log_server" --save_path="/trillian_log_server"

The only problem is that it produces folders not a single file but we can either zip that afterwards or ask upstream to provide a single output option.

Edit: license_file also supports folders

What's everybody's opinion?

pkgw commented 2 years ago

If there's a recommended tool, it definitely seems like we should try to integrate it into our best-practices workflows, yeah!

BastianZim commented 2 years ago

Do we have a go feedstock that is controlled by a member of core somewhere? I'd like to test this against a real feedstock before adding it to the docs but I don't have any go ones.

xhochy commented 2 years ago

Feel free to use https://github.com/conda-forge/go-sops-feedstock for this

maresb commented 2 years ago

Regarding go-licenses, I don't know any Go myself, but I'm quite satisfied with the recipe I came up with for the Dasel feedstock. I hope it might be useful as a reference for others working on Go packages.

I'm especially satisfied about how it compiles on linux/osx/win without needing separate build scripts, which is an improvement over other recipes I've seen.

One peculiarity was needing to download the source to a subfolder (src/dasel) in order to avoid the error

$GOPATH/go.mod exists but should not

Another peculiarity was coming up with the particular syntax of

cd src/dasel
...
go-licenses save . --save_path=license-files

which works across platforms.

BastianZim commented 2 years ago

Oh that's great news, thanks! I ran into the same problem when testing this myself so that's awesome.

@conda-forge/go Do you think this is reproducible? Then I'll add this to the docs and we can close this issue.

maresb commented 2 years ago

You might want to hang on for one moment, I'm looking at adding the osx-arm64 migration to Dasel, and I'm getting an error from cross-compilation due to $GOBIN being defined.

Maybe we should make one final push to fix https://github.com/conda-forge/dasel-feedstock/pull/2 before settling on a standard? (BTW, please help, since I don't know Go! :joy:)

BastianZim commented 2 years ago

That is probably the same as https://github.com/conda-forge/go-licenses-feedstock/pull/9? I only found this so far: https://stackoverflow.com/questions/55532868/how-to-build-install-cross-compiled-nested-packages-quickly and https://github.com/golang/go/issues/11778 (But you know those probably already😄) But messing that "deeply" with cross-compilation should probably be discussed with someone more in the topic. Maybe it can also be set/done conda-forge wide, or we can at least add something to the docs?

maresb commented 2 years ago

Thanks @BastianZim! Those links were actually new to me. Good to know for instance that I'm not alone on Conda-Forge, and that gives me a few ideas.

But messing that "deeply" with cross-compilation should probably be discussed with someone more in the topic.

I'm not sure what you mean by "messing", but anything I do here related to Go should definitely be sanity-checked. :smile:

BastianZim commented 2 years ago

Ahh ok great! :)

I was primarily talking about where to place the binaries as discussed in the stack overflow post and for the conda-forge wide solution, not what you did already. But I'm no expert here either so no idea how forgiving/strict conda-build is here. 😄

maresb commented 2 years ago

For osx-arm, I couldn't find a cross-compilation command which is also Windows-compatible. I ended up using the # [arm64] selector to switch between go install for normal installation and go build for cross-compilation. But I still think it's relatively elegant.

BastianZim commented 2 years ago

Hmm interesting. How about going with go build for everything? Or is install better?

maresb commented 2 years ago

How about going with go build for everything?

Excellent question! I attempted this, and the problem is the environment variable $PREFIX/bin/ for the build command needs to be %PREFIX%\bin\ (or similar) on Windows.

In other words, my go install command is multi-platform (linux, osx, windows) because it requires no envvars, but is not compatible with cross-compilation.

In contrast, my go build command with the environment variable is compatible with cross-compilation but not multiplatform.

Since cross-compilation currently takes place only on Linux, I can get away with using $PREFIX/bin/.

maresb commented 2 years ago

It should also work with something similar to

    - go build -o "${PREFIX}/bin/" -v -ldflags "-X github.com/tomwright/dasel/internal.Version=v{{ version }}" ./cmd/dasel  # [not win]
    - go build -o "%PREFIX%\bin\" -v -ldflags "-X github.com/tomwright/dasel/internal.Version=v{{ version }}" ./cmd/dasel  # [win]

If one assumes that everything should be built for osx-arm, then probably this form is actually more desirable.

(I was just thinking it was slick how the go install command is cross-platform and works with no selectors, but that unfortunately ignores the current necessity of osx-arm cross-compilation.)

maresb commented 2 years ago

It should also work...

I was just playing around with this in https://github.com/conda-forge/dasel-feedstock/pull/3 and failing. Debugging Windows via the CI is really slow and painful, so I give up. Here's the info in case anyone else wants to try. (I suspect this would be easy for anyone with a better understanding of conda-build and a Windows computer.)

Where I'm stuck is with go build on Windows. I'm not so sure what to set as the output directory, and conda-build isn't finding my binaries, so I always get number of files: 0.

isuruf commented 2 years ago

Haskell/Cabal is another programming language/package manager that runs into this issue.

cc @ocefpaf, @msarahan

ocefpaf commented 2 years ago

More info on this. Isuru mentioned that cabal-db can help get a list of the licenses for the dependencies in a haskell project. However, cabal-db is pretty much impossible to install. I tried multiple cabal versions and in different systems. I guess the project is abandoned. Are there other alternatives?

ocefpaf commented 2 years ago

cabal-plan seems more promising: https://hackage.haskell.org/package/cabal-plan#description