389ds / 389-ds-base

The enterprise-class Open Source LDAP server for Linux
https://www.port389.org/
Other
213 stars 94 forks source link

legal - clarify our policy on AI generated code #5763

Open Firstyear opened 1 year ago

Firstyear commented 1 year ago

Recently in another project a contributor mentioned they were trying github copilot. After discussing with SUSE legal and their policy ( https://opensource.suse.com/legal/policy ) the assessment is that until it has been tested in court that we can't distribute code which was AI generated. I assume RedHat may have similar.

I think that it is likely we need to also have RedHat legal follow up with their assessment, and for the time being we need to not accept Ai generated code. In other projects we have done this with a checklist item in the pr template on the very first line.

mreynolds389 commented 1 year ago

@Firstyear so Red Hat does not have any official restrictions on using AI generated code. There are other caveats, but nothing saying you can't use it.

That being said, at this time I am fine not allowing it in our upstream project. We should add some type of disclaimer on the contribution page...

droideck commented 1 year ago

I agree we should add some notice on the contribution page as we are GPLv3. So to remind people that their AI code easily could be "copied" from a more restrictive license - but it heavily depends on the tool they used to generate the code.

But I'm curious that Copilot has existed for over a year (and around two years with invites, IIRC). So there were no court decisions that SUSE legal can accept now, and it became smarter with some settings for any legal issues.

Also, how would you determine if it was AI generated? For example, this: https://github.com/389ds/389-ds-base/pull/5764 It could be 100% person, but it can easily be 80% AI generated with Copilot or ChatGPT. Or 50% or 10% AI-generated. Current tools are very good with JS.

Out of curiosity, - and what about the case when you "fed" the 389 DS code base and generated the code from it? (it may have 5-10% of other code, but it won't be distinguishable at all) For example, Anthropic’s Claude AI can digest 75k words in seconds. And in a year or months, it could be even more. Already, some bots can scan your code base and then add comments to the project issues with suggested fixes.

So, yes, a disclaimer is a good step, but IMHO, the person who submits the PR should be mindful — the same thing as with usual copyright.

mreynolds389 commented 1 year ago

I agree we should add some notice on the contribution page as we are GPLv3. So to remind people that their AI code easily could be "copied" from a more restrictive license - but it heavily depends on the tool they used to generate the code.

But I'm curious that Copilot has existed for over a year (and around two years with invites, IIRC). So there were no court decisions that SUSE legal can accept now, and it became smarter with some settings for any legal issues.

Also, how would you determine if it was AI generated? For example, this: #5764 It could be 100% person, but it can easily be 80% AI generated with Copilot or ChatGPT. Or 50% or 10% AI-generated. Current tools are very good with JS.

Of course a contributor could "lie" about the origins of their code, but I believe untouched AI generated code comes with its own copyright. For example, if we see a "copilot" copyright we now its github AI generated code, etc.

So this is all on the honor system, but yes we need to let potential contributors (the small handful that are out there) know that we are not accepting this code at this time. So we should update the wiki, and probably send 389-devel/389-users lists a warning as well.

Out of curiosity, - and what about the case when you "fed" the 389 DS code base and generated the code from it? (it may have 5-10% of other code, but it won't be distinguishable at all) For example, Anthropic’s Claude AI can digest 75k words in seconds. And in a year or months, it could be even more. Already, some bots can scan your code base and then add comments to the project issues with suggested fixes.

Code changes based on AI suggestions is slightly different I suppose. I think the main point here is to not blindly accept 100% AI generated code. While Red Hat is technically ok with this, I agree with Suse that we should proceed with extreme caution at this time.