INTERSECT-training / software-licensing

https://intersect-training.org/software-licensing/
Other
0 stars 2 forks source link

Add discussion of licensing concerns with LLMs #7

Open troycomi opened 2 months ago

troycomi commented 2 months ago

Based on discussion with David, came up with 3 phases where licensing has important consequences with using LLMs (e.g. copilot) with software development:

Since a lot of the details haven't be tried in court, will probably be full of open ended questions, but worth including since it's likely to come up anyways.

bernhold commented 2 months ago

This seems like a good set of points to discuss.

Related to the first one, there's also a question of how do you even know what code it was trained on, how any of that code was originally licensed, and whether they sought/got permission to use it to train the model?

And related to the third point, not just the attribution, but what licenses can be applied to it? If there was copyleft content in the training data, does that mean anything the LLM generates must be considered under the same license? As you say, none of this has been decided in court yet, but it is certainly an argument that the copyleft proponents are likely to make. And then, to the question above, can you be sure the training set did not include copyleft code? Maybe LLM-generated code is un-licensable? I think so far, USPTO have said that AI-generated material is not patentable, copyrightable, etc., and courts so far have agreed (just from the cases I've heard about in the news, not a systematic examination.)