Open isuruf opened 4 years ago
What I'm doing in particular is using the JSON output information produced by cargo-license
and then grab the repository urls across GitHub, BitBucket and GitLab to call their respective APIs to locate and download all the licenses. However, some libraries need a manual license download still.
Doesn't the same concern apply to go packages?
To not re-invent the wheel here, how are other packaging eco systems solving that e.g. linux distributions like debian or homebrew?
Yes, the same concern apply to Go packages. See also https://github.com/google/go-licenses
I've no idea how others fix this.
I am not sure how you want to address that but it does not seem straightforward. We could use a script that goes over all the dependencies, parse for the licenses, and list all the licenses per deps in the conda package?
Also at what level this script should be run? conda or conda-forge?
@hadim, what @andfoy did for rust was to use a script to download licenses and put them in the recipe (and manually add licenses for packages that the script failed). He also added a check in build.sh
to check that each dependency had a license file in the recipe. Same can be done for Go.
It makes sense.
That being said I probably don't have the bandwidth at the moment to do that for https://github.com/conda-forge/staged-recipes/pull/11799
For go, it's simple. See https://github.com/google/go-licenses#complying-with-license-terms
Quick thought, this also applies to C++ packages when you link statically with your dependencies.
Should this be extended to header only dependencies as well? For example, if you use pybind11, boost, etc, do you need to package the license file used by them as well? Because that's as good as statically linking parts of them.
Perhaps there needs to be a licence_exports
field in the conda build metadata.
Should this be extended to header only dependencies as well?
Depends on the license.
For example, if you use pybind11, boost, etc, do you need to package the license file used by them as well?
pybind11
: yes. boost
: no.
Thanks for the guidance here on this topic: texlab-feedstock is now using the same approach as pysyntect-feedstock, and "only" required manually hunting down 20 licenses (of 200+). Perhaps we should package cargo-license
... seems to cost a couple minutes per build.
As this has come up again for @conda-forge/cryptography:
I wonder if we start curating a community package, e.g. conda-forge-rust-licenses
and conda-forge-go-licenses
(or just lump them together under conda-forge-license-library
) which has some automation to at least allow centralizing the list of known/used <thing>/<version>/(UN)LICEN(S|CE(-.*)(.(txt|md))?
(oh and don't forget COPYRIGHT.*
). Then packages can demand said package during builds, copying the assets from a well-known location to wherever there license_file
points... now that we can use folders, that's much easier. If a new crate/mod shows up, the build would fail, but might suggest...
Some wild crates and mods approach!
- <crate>@<version> <url>
- <mod>@<version> <url>
From inspection, I've found the below licenses. Please visit the upstream repos and verify, then
make a pull request to https://github.com/conda-forge/conda-forge-license-library adding the lines:
### recipe/licenses/cargo.txt
<repo>@<tag>/LICENSE-MIT
<repo>@<tag>/LICENSE-APACHE
### recipe/licenses/go-mod.txt
<repo>@<tag>/LICENSE-ZLIB-WITH-FREAKY-SPEC
this would in turn update the recipe (once) so we actually have the licenses sha256sums.
So would a conda-incubator/*
be the right path? I'm imagining a small (potentially single file) python package with a simple in-build CLI like cargo-licenses | dmv -o $SRC_DIR/third-party-licenses
. The JSON/CSV file with, at the very least, the couple hundred licenses URLs/SHAs, would then live in the feedstock... but could contain the actual licenses texts themselves.
Hello! I've been working on a tool to hopefully mitigate this issue / make it less painful to publish rust tools on conda-forge. It can be found here.
In short, it crawls the package dependencies and searches out the license files that correspond to what is in the Cargo.toml
. If a license isn't found or looks suspicious it will write a warning message. It also provides a "check" flag that takes a previous version of a THRIDPARTY file and compares that against the new one, failing if they are different.
The idea is that the workflow would go as follows:
cargo bundle licenses
once, address all warnings by manually finding licenses where needed and copy-pasting them into the generated file. CHeck that file into version control and include
it your manifest.cargo bundle licenses --output CI-THIRDPARTY --previous THIRDPARTY --check-previous
in your CI. This will carry forward any manually changed entries for you, then do a whole file check for sameness, so if a version changed it would fail and force you back to step 1.Currently this tool supports three formats: yaml, json, and toml. See the above repo for an example yaml THIRDPARTY file.
In the view of conda-forge maintainers, would this satisfy the requirement of licenses and copyrights of the dependencies need to be distributed with the package
?
Looks good! Really anything that moves things forward sounds great to me... I'm wagering if:
cargo-licenses
, if not superseded) is packaged (dogfooding itself) through staged-recipes
requirements/build
test/requires
, and call it, simply, in build.sh|bld.bat
ripgrep
staged-recipe
PRs/a knowledge base text chunk... I don't see what complaints there would/could be.
From a KISS perspective, and as I don't really want to hand edit this file, I'd see JSON being the preferable serialization format... to that end, now that SPDX 2.2.1 is ISO5962, I'd really hope we start seeing it adopted more broadly (and provided by upstream packagers) and can stop needing to re-implement clever stopgaps.
@bollwyvl, thanks for the feedback!
Here is a PR for adding cargo-bundle-licenses
to staged-recipes
. To be clear, this would supersede cargo-license
. The soul purpose of this tool is to satisfy the requirements of conda-forge
packaging and make it less onerous to publish rust packages here.
I have two PR's dogfooding it right now: https://github.com/conda-forge/staged-recipes/pull/16110 and https://github.com/conda-forge/staged-recipes/pull/16111, I'll update them to pull in cargo-bundle-licenses
via build requirements once / if the cargo-bundle-licenses
PR can be merged.
That's great progress! Good luck! Once again, I'd prioritize the initial staged-recipes
PR for the tool itself, and then ensure it meets the needs of at least one known-important, but presently hand-curated, package, as they are the most likely to have been reviewed. Ensuing new packages will then be an easier pitch, as we'll be more confident.
By the by: I can't merge anything, don't really do rust
(or go
) dev, and am actually super constrained on community time right now anyway, so really I'm just selfishly looking forward to having some tools like this to ease my personal maintenance burden. God- (or -spirit-or-priniciple-or-animus-or-whatever-) speed!
@bollwyvl I appreciate the guidance on this!
Thanks @sstadick! I merged the tool recipe.
Both https://github.com/conda-forge/staged-recipes/pull/16110 and https://github.com/conda-forge/staged-recipes/pull/16111 are now using the conda-forge
cargo-bundle-licenses
package to check that all thirdparty licenses are present.
PR Adding cargo-bundle-licenses
to ripgrep-feedstock
https://github.com/conda-forge/ripgrep-feedstock/pull/17
Looks good to me. What happens when cargo-bundle-licenses can't find a license/copyright for a package?
If run without --check-previous
it will just write a warning say it couldn't find the license, and then in the THIRDPARTY.yml
file it will put NOT FOUND
for the license text, the idea being that a user would then go find it and manually add it so that the next time you run it with --previous
it will pull the manually found license forward for you if it still can't find it.
If running with --check-previous
, as in the PR's above / in CI generally, if something is still NOT FOUND
or different than the --previous
license set the tool will exit 1 and fail to get someone's attention. Hopefully this means the THIRDPARTY file will actually stay up to date as deps change instead of making it once and forgetting it.
Perfect. Thanks for working on this
If this is good to go I'd love to get these two PR's merged: conda-forge/staged-recipes#16110 and conda-forge/staged-recipes#16111.
I'm sure there will be rough edges with cargo-bundle-licenses
, I'm more than happy to resolve issues as they come up / help Rust packages get into conda.
@sstadick thanks for the great tool! I've also used it in https://github.com/conda-forge/staged-recipes/pull/16252 I can see it being useful in other projects too.
Thanks @sstadick! 😄
It would be great to integrate this strategy into grayskull
, which we use to create/update recipes
@jakirkham I agree! I think it's worth waiting a bit to see where the rough edges are in the cargo bundle-licenses
workflow first. But it would be nice to have rust-project template.
This came up again recently for go
and I was wondering if we shouldn't recommend the same approach here as for cargo-bundle-licenses
.
As mentioned above, go-licenses
is the recommended tool to collect these licenses so how about adding this to the build step and then adding the output to the license_file list?
Something like
build:
number: 0
script:
- go-licenses save "github.com/google/trillian/server/trillian_log_server" --save_path="/trillian_log_server"
The only problem is that it produces folders not a single file but we can either zip that afterwards or ask upstream to provide a single output option.
Edit: license_file also supports folders
What's everybody's opinion?
If there's a recommended tool, it definitely seems like we should try to integrate it into our best-practices workflows, yeah!
Do we have a go
feedstock that is controlled by a member of core somewhere? I'd like to test this against a real feedstock before adding it to the docs but I don't have any go
ones.
Feel free to use https://github.com/conda-forge/go-sops-feedstock for this
Regarding go-licenses
, I don't know any Go myself, but I'm quite satisfied with the recipe I came up with for the Dasel feedstock. I hope it might be useful as a reference for others working on Go packages.
I'm especially satisfied about how it compiles on linux
/osx
/win
without needing separate build scripts, which is an improvement over other recipes I've seen.
One peculiarity was needing to download the source to a subfolder (src/dasel
) in order to avoid the error
$GOPATH/go.mod exists but should not
Another peculiarity was coming up with the particular syntax of
cd src/dasel
...
go-licenses save . --save_path=license-files
which works across platforms.
Oh that's great news, thanks! I ran into the same problem when testing this myself so that's awesome.
@conda-forge/go Do you think this is reproducible? Then I'll add this to the docs and we can close this issue.
You might want to hang on for one moment, I'm looking at adding the osx-arm64 migration to Dasel, and I'm getting an error from cross-compilation due to $GOBIN
being defined.
Maybe we should make one final push to fix https://github.com/conda-forge/dasel-feedstock/pull/2 before settling on a standard? (BTW, please help, since I don't know Go! :joy:)
That is probably the same as https://github.com/conda-forge/go-licenses-feedstock/pull/9? I only found this so far: https://stackoverflow.com/questions/55532868/how-to-build-install-cross-compiled-nested-packages-quickly and https://github.com/golang/go/issues/11778 (But you know those probably already😄) But messing that "deeply" with cross-compilation should probably be discussed with someone more in the topic. Maybe it can also be set/done conda-forge wide, or we can at least add something to the docs?
Thanks @BastianZim! Those links were actually new to me. Good to know for instance that I'm not alone on Conda-Forge, and that gives me a few ideas.
But messing that "deeply" with cross-compilation should probably be discussed with someone more in the topic.
I'm not sure what you mean by "messing", but anything I do here related to Go should definitely be sanity-checked. :smile:
Ahh ok great! :)
I was primarily talking about where to place the binaries as discussed in the stack overflow post and for the conda-forge wide solution, not what you did already. But I'm no expert here either so no idea how forgiving/strict conda-build is here. 😄
For osx-arm, I couldn't find a cross-compilation command which is also Windows-compatible. I ended up using the # [arm64]
selector to switch between go install
for normal installation and go build
for cross-compilation. But I still think it's relatively elegant.
Hmm interesting. How about going with go build for everything? Or is install better?
How about going with
go build
for everything?
Excellent question! I attempted this, and the problem is the environment variable $PREFIX/bin/
for the build command needs to be %PREFIX%\bin\
(or similar) on Windows.
In other words, my go install
command is multi-platform (linux, osx, windows) because it requires no envvars, but is not compatible with cross-compilation.
In contrast, my go build
command with the environment variable is compatible with cross-compilation but not multiplatform.
Since cross-compilation currently takes place only on Linux, I can get away with using $PREFIX/bin/
.
It should also work with something similar to
- go build -o "${PREFIX}/bin/" -v -ldflags "-X github.com/tomwright/dasel/internal.Version=v{{ version }}" ./cmd/dasel # [not win]
- go build -o "%PREFIX%\bin\" -v -ldflags "-X github.com/tomwright/dasel/internal.Version=v{{ version }}" ./cmd/dasel # [win]
If one assumes that everything should be built for osx-arm, then probably this form is actually more desirable.
(I was just thinking it was slick how the go install
command is cross-platform and works with no selectors, but that unfortunately ignores the current necessity of osx-arm cross-compilation.)
I was just playing around with this in https://github.com/conda-forge/dasel-feedstock/pull/3 and failing. Debugging Windows via the CI is really slow and painful, so I give up. Here's the info in case anyone else wants to try. (I suspect this would be easy for anyone with a better understanding of conda-build and a Windows computer.)
Where I'm stuck is with go build
on Windows. I'm not so sure what to set as the output directory, and conda-build isn't finding my binaries, so I always get number of files: 0
.
Haskell/Cabal is another programming language/package manager that runs into this issue.
cc @ocefpaf, @msarahan
More info on this. Isuru mentioned that cabal-db can help get a list of the licenses for the dependencies in a haskell project. However, cabal-db is pretty much impossible to install. I tried multiple cabal versions and in different systems. I guess the project is abandoned. Are there other alternatives?
cabal-plan seems more promising: https://hackage.haskell.org/package/cabal-plan#description
A typical rust package use dozens of packages which have different licenses and requirements. A rust package and its dependencies are usually compiled into one library or executable. For eg: https://github.com/conda-forge/staged-recipes/pull/11315 has a rust package with 91 dependencies with various MIT/BSD-3-Clause/Apache-2.0 licenses and maybe others.
This implies that the licenses and copyrights of the dependencies need to be distributed with the package. There are some tools to help do this like https://github.com/maghoff/cargo-license-hound, https://github.com/onur/cargo-license.
I'm opening this issue so that @conda-forge/staged-recipes and @conda-forge/core know about this when reviewing Rust recipes.
cc @andfoy, @mingwandroid