Clay-foundation / model

The Clay Foundation Model (in development)
https://clay-foundation.github.io/model/
Apache License 2.0
351 stars 44 forks source link

Model License #1

Closed brunosan closed 10 months ago

brunosan commented 1 year ago

Clay will be open. This Issue document specifically a proposal of what it means. Everything is up to debate, and in brackets my default option in some specifics where I'm too unsure.

Explicitly:

Internal vs external:

We might work fully in the open, or we might keep an internal repo and then make "reselases" often into an open repo. I'm also ok if we squash history, but we should tend to minimize "release work". It is likely we will work directly in the open. Since Clay as a project hasn't been announce yet, it is not urgent to decide.

Specific licenses:

Commercial restrictions?

It's too tempting to figure out ways to help maintain our non-profit via our product. Not 100% convinced we should, since Goal 1 is to become the base model standard. Other open projects with operational goals find other mechanisms (e.g. RedHat, python, Linux, OSM, Overture, OpenAI ...), those mechanisms can be e.g. memberships, consulting, philanthropy, fee4feature, SLA endpoints, support, ...

Any friction on commercial use will be hugely decisive on adoption. I'm not sure if we should add commercial clauses (even soft ones like LLAMA).

My proposal is to come out as: "Commercial use is allowed and explicitly encouraged under our open licenses".

Once we become the standard, it should not be hard to gain revenue from memberships, SLAs, support, ... If need be, we can add commercial licensing to e.g. Clay v10.

What's your feeling on this?

weiji14 commented 11 months ago

I'm not a lawyer, but specifically on the model weights, you might want to be careful in treating them as just 'data', and look at proper licenses that spell out how AI artifacts can be used for downstream applications. HuggingFace put out this blog post on using Open Responsible AI Licenses (OpenRAIL) - https://huggingface.co/blog/open_rail, and these OpenRAIL licenses - Open RAIL-M (for model weights) and OpenRAIL-S (for source code) might be worth a look. More details at https://www.licenses.ai/blog/2023/3/3/ai-pubs-rail-licenses

brunosan commented 11 months ago

Thanks @weiji14. I really appreciate input here. OpenRAIL in principle sounds exactly what we want.

I do worry when using something not standard as it increases friction of adoption, and I do not know how widely accepted these licenses are or might be in the short term.

We do not need to choose a specific license until we release it, or at least until the announce Clay as an organization. We do know the intent (stated above). Let's keep brainstorming here.

[I've also edited to reflect that we are likely to work in the open so we don't really need to do much "release work"]

weiji14 commented 10 months ago

Bumping this topic up again since we're working on the embedding generation recently, and it would be good to clarify what they should be licensed under.

I do worry when using something not standard as it increases friction of adoption, and I do not know how widely accepted these licenses are or might be in the short term.

There are 20000+ models on HuggingFace with an OpenRAIL license now, see https://huggingface.co/models?license=license:openrail. Apache-2.0 is at about 50000+, so actually not too bad!

Another good resource worth looking at is The Foundation Model Transparency Index - https://crfm.stanford.edu/fmti. There are about 100 indicators, but some relevant ones include:

brunosan commented 10 months ago

Thanks for the ping @weiji14. I've started a PR for the model license #63. Also thanks for the transparency score framework. It seems topping that ranking is within scope. #64 (No PR yet but working on a branch if you want to contribute)

weiji14 commented 10 months ago

Closing as this is done in #63. We still haven't created a license for documentation, but will use CC-BY when that is set up.