LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
37k stars 3.23k forks source link

Apache 2.0 vs GPLv3 #1428

Closed LuisHPorras closed 1 year ago

LuisHPorras commented 1 year ago

Hello everyone, just wanted to recap the conversations I've been able to see on the Discord server around the licensing of the project and found out there are no explanations of why this license. I'd like to make a more accessible and a structured conversation around this since I think this is a very important decision for every software project. I guess it could be added later to the FAQ.

Here is some info on the main difference between both licenses (if my information is incorrect please correct me):

On the one hand, as stated int he Apache 2.0 license:

  1. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.

So, with any derivative work (contribution) the person is able to sublicense the whole code, therefore it's possible to add a contribution and then make it privative software.

On the other hand, as stated in the GPLv3 license:

Sublicensing is not allowed; section 10 makes it unnecessary. [...]

  1. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. [...]

Thus, the GPL license enforce any contributor to use the same license. The key point here is to avoid digital extractivism where big techs benefit from the community without contributing back to it. It also means that, once a piece of software is released it can't be deprecated by another one that it's based on it, thus the project becomes a digital commons.

It would be great if the team could share why this decision was taken and the main reasons. Also if the question is still open or not.

Thanks!

JohannesGaessler commented 1 year ago

For a chatbot in particular if anything the Affero general public license (AGPL) should be used instead of the plain GPL. It adds a clause that states that the copyleft also applies when providing access to the software via a network, as would probably be the case if some company made a custom version.

olliestanley commented 1 year ago

This has been discussed in #218.

From Yannic: "We want OA to be both used and included into apps, open-source and proprietary. All major ecosystems like Tensorflow & Pytorch are permissively licensed exactly because of that. I agree, we might miss out on someone's improvement to the system because they're not forced to share it, but in turn we will massively benefit from unrestricted adoption. As soon as you enter the realm of restrictions via licensing, a huge chunk of potential adopters drops away, many of which would voluntarily contribute to the project (even though they are not forced to)."

Will leave this open in case there are more views but it seems unlikely that any change is made now.

ParisNeo commented 1 year ago

I think apache 2.0 is the right Licence here. Prople should be able to use this tool by adding private stuff, monetize it or do whatever they want with it.

That's the main goal.

If we use GPL, we loose a huge chunk of potential users.

I would even go farther and use MIT licence if I could.

LuisHPorras commented 1 year ago

Here is my opinion:

  1. For sure this is a socio-political choice that directly impacts the ethics of the project and therefore the generated community around it.

  2. Free software licenses doesn't restrict companies from benefit from the code, it only asks for the same treat it's been given.

  3. Yes, it is a red flag for capitalist companies and yes it's a huge green flag for solidarity economy companies. To share some data, in Spain this represents 10% of the GPD and it's estimated to be 8% in Europe here a source in spanish Cepes Therefore, the impact of the project could benefit the solidarity economy and the other way around.

  4. The fact that the derived work can be privatively licensed makes the technology less secure, given the fact that AI is a very fragile technology due to biases that reproduce social injustice such as racism, sexism... the project would be giving away all gathered knowledge without the ability to audit the derived work. Therefore the project could be contributing to harnessing solutions without the legal tools to fight back.

  5. I've seen that Mastodon was mentioned in the other issue so I will add on the comment also by mentioning the WordPress example which nowadays is still the main reference for a small to medium scale website. The project has not only achieved massive adoption, it has enabled thousands of businesses to thrive avoiding centralization of the benefits. I really think that stepping in to a free software license could boost the project and make it engage with other projects that share it's ethics.

  6. Last but not least if the team really aims for massive adoption I would keep something in mind, if the project can be copied, removed its ethics and used with other intentions then how long is it going to be alive? The first moment that the project doesn't follow the roadmap of whatever big tech that initially supported it financially it is going to sink, while the company can carry on with its product.

So I would ask you guys ¿massive adoption at what price?

balisujohn commented 1 year ago

My opinion: I support a permissive free and open source license like Apache 2.0/MIT.

ParisNeo commented 1 year ago

The problem with GPL is that it is contagious. I have had a baaad experience with it in my job. I had to completely get rid of a module that was GPL because it contaminates my code and my boss doesn't like it. So I threw the library away and basically wasted days redoing the work from scratch.

I have talked with legal specialists. If the code gets contaminated, it gets contaminated, and the customer has the right to ask for the full source code including our added value code.

So where I work, GPL is now essentially forbidden. It is a huge barrier against adoption actually. Because, let's face it. This thing should be used by small buisnesses and companies. Google and Microsoft don't need this, they have unmatched tools. This is mostly done for small companies. If we contaminate their code, they'll look elsewhere.

So, sure, I am an advocate of free software, but if you want wide adoption, don't use a contagious licence.

With Apache or MIT, The code is still open, and people should share the source code of the main software, but they can close the additions they put.

Imagine, tomorrow we use LORA to add a fine tuning of the model. People can still share the original model, but have the extra finetuning closed because it is part of their added value and sometimes they need to protect it if they plan on selling it. Many companies work this way.

We do this work to help small buisnesses and general people benefit from the openness and being able to see how the thing works, upgrade it, share the upgrades with others etc. Biut also help them make new jobs and opportunities.

balisujohn commented 1 year ago

My only really strong opinion is that a OSI compliant license is used:

https://opensource.org/osd https://opensource.org/licenses

Aspie96 commented 1 year ago

The title is misleading.

Each of the Apache 2.0 license and the GPLv3 license is both and open source and a free software license.

The Apache 2.0 also happens to be a permissive non-copyleft license (as many other free software licenses).

The GPLv3 also happens to be a copyleft license (as some other open source licenses are).

Both licenses are approved by both the Open Source Initiative and the Free Software Foundation.

The Open Source Movement and the Free Software Movement are different ideas, but they describe virtually the exact same class of software, with the only few differences sparsely resting along the very edge of the class.

Sources:

It is the position of the FSF that the Apache 2.0 license "is a Free Software License", as well as that "Among all programs that are open source, only a minuscule fraction are not free" (emphasis mine).

LuisHPorras commented 1 year ago

I changed it. Thanks for the correction

Aspie96 commented 1 year ago

Thank you.

For future reference: the word you were looking for is probably "copyleft" as opposed to "free software"/"open source". Free software non-copyleft licenses (such as Apache 2.0 or MIT) are sometimes referred to as "permissive" (the term is slightly debated, since all licenses are permissions).

LuisHPorras commented 1 year ago

For me the correct term its free software, I was not aware that the FSF had decided to allow open source to be called free software, that is giving up the fight. There is no other difference between the two concepts and yet they are very different things. There is no point in calling a software "libre" if it can be used to once again make users slaves of companies. The key of copyleft is that it really frees the software. No copyleft no free software.

Aspie96 commented 1 year ago

I was not aware that the FSF had decided to allow open source to be called free software,

To my knowledge, this has been the case for as long as the FSF has been a thing, since those licenses comply with the FSD.

That's true down to the oldest definition of the term given by the FSF that I could find, which dates back to their old website in '96 under the MIT domain.

When the Open Source Initiative was born, they very openly had the intention of promoting the same thing in a different way and under a different name (and, Stallman thinks, different motives). They described this in the announcement of the birth of the Open Source Initiative.

And indeed, one of the very first approved licenses by OSI was the GPL.

Both "free software" and "open source" are even used in legal documents (including actual laws) and to my knowledge they always describe basically the same thing. And while some exceptions and disagreements on definition do exist, neither the Apache 2.0 or the GPLv3 are controversial on that matter: they both are both free software and open source.

You are defining "free software" in your very own particular individual way, which doesn't actually reflect its standard meaning.

But there is already a word which clearly describes that class of software.

LuisHPorras commented 1 year ago

Definetly, you are totally right.

Im using my own definition in which free software and open spurce software are not interchangeable. I thought that was the whole point of using different terms. But as you proved the difference I was using and which is accepted in social movements (at least those in which I participate), its not oficially supported by FSF, FLOSS then it's not useful for communities from my point of view. We migth need to claim for a better definition of free software or drop the useless term. The point still stands, only Copyleft licenses protect communities from capitalism, tranforming software into a digital commons instead of an intelectual property that can be use for individual profit.

Thanks a lot for your contributions!