conda-forge / conda-forge.github.io

The conda-forge website.
https://conda-forge.org
BSD 3-Clause "New" or "Revised" License
124 stars 273 forks source link

GPL/non-GPL feature #209

Open jakirkham opened 8 years ago

jakirkham commented 8 years ago

As some libraries are GPL'd and are in some cases optional requirements of packages, it would be nice if we had a mechanism to control whether such packages do or do not get installed. Related we would need a way of controlling whether a package gets built with a GPL dependency or not. The simplest way that comes to mind ATM would be creating a gpl feature and corresponding package that needs to be installed to enable this. This might be too wide sweeping and it may require per package features. If it does start going in the latter direction, perhaps solving issue ( https://github.com/conda/conda/issues/3299 ) will give us something for this too.

hmaarrfk commented 6 years ago

Identifying if a package is GPL or not is quite tricky. However, if we do find that some compiled binaries are GPL, we must take action towards abiding by the GPL copyright laws.

Basically, it boils down to: "is the code running in the same memory space as other GPL code".

However, if your program is compiled with the readline library for example(as is the case for conda-forge's distribution of sqlite3 and python), certain binaries in that package have included material portions of code protected by GPL.

You can use ldd to check for yourself. If a library is linked to GPL code, it is GPL. Not too hard in those cases.

If you find that a library is linked to GPL code, then by definition (GPL-v3, clause 3), that particular instance of the compiled library is now protected by GPL. And as such, you must abide by clause 3, which means you must give

a) Give prominent notice with each copy of the object code that the Library is used in it and that the Library and its use are covered by this License.

b) Accompany the object code with a copy of the GNU GPL and this license document.

In the case of python and sqlite3 on conda-forge, prominent notice is not given. Nor do the tar balls include notice of the GPL license code. readline is a particularly important case to follow as they have already caused CommonLisp to change their License https://en.wikipedia.org/wiki/GNU_Readline#Choice_of_the_GPL_as_GNU_Readline's_license

It is important to note that the "Source" and the "Binary distribution" can have different licenses. The source may be PSR/Public Domain, but conda-forge's distribution of certain binaries in those packages definitely isn't.

If conda-forge wants to distribute Python and Sqlite3 as GPL, that might be a possibility. Personally, I would think it is a mistake.

I would much prefer splitting the packages, or allowing users to download GPL packages that give them the features they want, while compiling, when possible, with non-GPL code allowing developers freedom of choice in the Licenses they want to choose.

hmaarrfk commented 6 years ago

I know I raised this flag, probably when many people are probably on vacation, but I feel like this is quite a serious issue. Blackduck makes a whole business of it. We can look at their info graphics stats and just divide them by 10, and they still look bad. We can do better with conda-forge!

compared to other build systems I have seen, conda-forge is really built on automation, with the goal of having little intervention from package manager for updates.

I suggest that (especially for GPL packages), we split them into a few packages, that contain at least two categories:

  1. Packages that provide executables with no header or development files.
  2. Packages that provide development files for future combination.

CI's can then trace the dependencies of what gets pulled in. Is it a GPL executable, or a GPL dev library? If it is a dev library, then a human audit might be necessary.

Finally, it would be interesting if for example the recipe could include:

  1. Non-GPL recipe for reference
  2. A GPL recipe that gives the desired output.

The diff (even at the binary level) of the result, would give us exactly what files are dependent on the GPL.

For the case of python, it would be clear that only the readline library is GPL. That could then be packaged separately, allowing libpython (and other libraries) to exist as PSF.

For the case of sqlite3, the exe file in /bin/ would be flagged. libsqlite3 probably won't.

Anyway, just some ideas. All would take time to implement.

hmaarrfk commented 6 years ago

Anthony Scopatz @scopatz 11:01 @hmaarrfk - Thanks for pushing on the GPL stuff! Don't get me wrong, I am all in favor of having a GPL-free part of the stack. I think the right way to do this (that doesn't break downstream packages) is to use the variants (as @jjhelmus pointed out, CC @CJ-Wright) or have a nogpl- namespace.

I think abiding by GPL is important for conda-forge to grow. The last thing I would want is for the project to be crippled by letter from a lawyer.

Core has been working on obtaining legal representation (and I think we are pretty close), so this shouldn't be a concern :) Christopher J. Wright @CJ-Wright 11:03 :thumbsup: @hmaarrfk there also exists a complete graph of conda-forge at https://github.com/regro/cf-graph including all the information in the feedstock recipe so you might be able to do some things with that to determine the extent of GPL in the stack Anthony Scopatz @scopatz 11:05 Also, we do abide by the GPL, and as @CJ-Wright just said, it is a pretty simple matter for users to walk their dependency tree to see if there is non-interpreted GPL code in their stack. This could be made easier by having a nogpl- suite of packages too but that would be a convenience, and I believe it would cause a lot of problems if we just started ripping out GPL code from the bottom up, simply for the sake of ripping out GPL code Again, having a guarantee that you could get a nogpl stack in conda-forge would be a great feature to have! I'd love it if you pushed forward with this, probably via the build variants mechanism Christopher J. Wright @CJ-Wright 11:09 @hmaarrfk I'd be happy to review a cf-scripts migrator if you want/are ready to run this at scale jakirkham @jakirkham 12:59 It would be great if we could move these comments into issue ( conda-forge/conda-forge.github.io#209 ) so it is easier to track the discussion going forward 🙂 Mark Harfouche @hmaarrfk 13:30 @scopatz - I'm not trying to be confrontational, it is just hard to gauge people's reactions online. Thanks for discussing and reviewing the PRs You have definitely made me rethink how "clearcut" the law is

jakirkham commented 5 years ago

Related point that is worth adding here for reference (and discussion). Namely what is it that the license field is communicating to users (or intended to communicate). Is it communicating the software's license or the final binary package's license?

cc @ocefpaf (as it is your comment and I don't want to misinterpret it)

pseudotensor commented 3 years ago

Agree, same as nomkl. Need nogpl etc. in order to avoid leaks of bad packages that cannot be commercially used.

Example is simple matplotlib package, which installs qt pyqt pyqt5-sip etc. that are GPL, but are actually not needed for matplotlib to function without doing qt things.

Another example is python-plugify. This installs both text-unidecode and unidecode, but the package only needs one of them. unidecode is GPL, but text-unidecode is Artistic. conda bundles both erroneously.

For now one has to manually post-hoc hack the package lists.

marcelotrevisani commented 3 years ago

Example is simple matplotlib package, which installs qt pyqt pyqt5-sip etc. that are GPL, but are actually not needed for matplotlib to function without doing qt things.

We do have matplotlib-base which does not rely on qt

Another example is python-plugify. This installs both text-unidecode and unidecode, but the package only needs one of them. unidecode is GPL, but text-unidecode is Artistic. conda bundles both erroneously.

You can create a PR there to build this package in a separated branch without those dependencies, or maybe just add that as run_constrained

For now one has to manually post-hoc hack the package lists.

We are open to improvements and PR. However, I would not develop this for the simple fact that is a private company concern than a community concern.

pseudotensor commented 3 years ago

For now one has to manually post-hoc hack the package lists.

We are open to improvements and PR. However, I would not develop this for the simple fact that is a private company concern than a community concern.

This affects the usefulness and lifetime adoption of conda in general. E.g. NVIDIA rapids is lately conda only, and building from source is highly non-trivial for all architectures. So if NVIDIA wishes to see adoption of conda and rapids, they would benefit from conda in general being able to be used more generally.

If they knew no company could easily adopt rapids because of conda mixing in GPL, they might not have made that jump.

datametrician commented 3 years ago

Despite the fact that this is simply not true, numerous companies have adopted RAPIDS and many startups depend on RAPIDS. Most companies have SW audit process to make sure they are not using GPL code in a non-compliant way. I wouldn’t speak on behalf of all companies, because companies with VERY strict legal SW Audit processes depend on RAPIDS AND conda.

pseudotensor commented 3 years ago

@datametrician What is not true exactly?

You are only strengthening my argument, not going against it. Given strict compliance requirements, there is no easy way in conda-forge to specify one does not want GPL. Instead, one ends up wasting time hacking conda or building things separately, which undoes the usefulness of conda package management.

The more one uses conda, the more likely GPL packages get pulled in, hence the lack of controls becomes more challenging.

isuruf commented 3 years ago

As @marcelotrevisani said, we are open to improvements. If you can't work on it, have a look at https://conda-forge.org/docs/contracting/00_intro.html

datametrician commented 3 years ago

What's not true is "no company" I think that's a stretch. On the other side, RAPIDS is not just a single library, it's a bridge of many libraries. For instance, many use our subword tokenizer with PyTorch for NLP. RAPIDS has to always work with dozens of different libraries, and there's no way to guarantee that with pip. We can at least test the interaction of RAPIDS with 100s of libraries with conda. Personally, I would rather trade users for stability and predictable integration with the rest of the GPU ecosystem (but that's just me)

pseudotensor commented 3 years ago

@datametrician I'm not sure what you are arguing for, I would think you'd want things to be easier, not harder, to use rapids. Easier would be the issue at end, allowing to avoid non-GPL packages. I am not sure why this is not obviously true.

datametrician commented 3 years ago

I'm arguing that our adoption of conda alleviates many other issues, we wouldn't want to move from conda, and companies are adopting RAPIDS. Having a way to isolate GPL libraries would be nice, but I'm arguing against blanket statements. Also, RAPIDS is GPL free, so your argument doesn't apply to RAPIDS, but other conda libraries. Tossing us in seems like a red herring.

pseudotensor commented 3 years ago

@datametrician This is a question of conda-forge as a package management system (see the issue at hand), not a question of what nvidia should be doing.

There is no "red herring". NVIDIA rapids is just one random example of something a company would want to use. If one only uses rapids, then sure there is no issue. But that is a fairly academic case. As one continues to use conda more extensively, one bumps up against this issue. conda-forge is an ecosystem of packages, so the issue at hand is how to use such an ecosystem in a case where one wants to use it in a company. If nvidia cares about companies, they should care about this issue. I'm not sure you are getting that point.

An example is opencv, which is a common ML package. It pulls in ffmpeg, but the conda-forge ffmpeg is GPL due to pulling in x264 and nettle. If conda-forge has some focus on GPL like they do MKL, then conda-forge escosystem would be easier to use. There are many such examples, making the default use of conda-forge not possible in a company. As I stated on the outset, one has to resort to hacks.

datametrician commented 3 years ago

If they knew no company could easily adopt rapids because of conda mixing in GPL, they might not have made that jump.

My only point is you can't say this, nor should you use NVIDIA to make your argument, because you don't know what we knew when making the decision or why we made it. Again, we made the decision for interoperability reasons.

I agree with you conda could do more to improve this. But that should be the basis of your argument. Don't pull us in :)

pseudotensor commented 3 years ago

If they knew no company could easily adopt rapids because of conda mixing in GPL, they might not have made that jump.

My only point is you can't say this, nor should you use NVIDIA to make your argument, because you don't know what we knew when making the decision or why we made it. Again, we made the decision for interoperability reasons.

"easy" is a opinion term. I'm confident it is less easy on average for most due to the issue at hand.

I agree with you conda could do more to improve this. But that should be the basis of your argument. Don't pull us in :)

I can pull you in because it is entirely relevant, as I stated. You don't have to focus only on an opinion statement that you disagree with.

datametrician commented 3 years ago

We will just have to agree to disagree. The facts of your argument are strong enough to stand on their own. Stating an opinion that NVIDIA naively made a decision of this gravity is... well... an odd thing for a partner to do.

pseudotensor commented 3 years ago

We will just have to agree to disagree. The facts of your argument are strong enough to stand on their own. Stating an opinion that NVIDIA naively made a decision of this gravity is... well... an odd thing for a partner to do.

Now I see extra words put in my mouth I didn't say, which is odd as well. It is clearly in nvidia's benefit for conda to be easier for companies to use.

datametrician commented 3 years ago

I strongly agree conda being easier would be beneficial. This is why NVIDIA funds conda-forge and has people contributing to it. It would be great if others did the same.

pseudotensor commented 3 years ago

I strongly agree conda being easier would be beneficial. This is why NVIDIA funds conda-forge and has people contributing to it. It would be great if others did the same.

Great, you could have stated this as your first and only statement and we would have been done already :)

h-vetinari commented 2 years ago

John wrote up a proposal that would be able to deal with this: https://github.com/conda-forge/conda-forge.github.io/issues/1608

hmaarrfk commented 2 years ago

Would it make sense to start building out gpl gplfree variants demarked by the build string for select packages?

"Big" proposals are "nice", but I don't think the underlying issue for those affected has necessarily gone away, and they might benefit from moving their projects forward.

hmaarrfk commented 2 years ago

One concern that came about in https://github.com/conda-forge/libtiff-feedstock/pull/75 is the fact that the abi my not be stable across these variants.