bigcode-project / opt-out-v2

Repository for opt-out requests.
7 stars 2 forks source link

GPL code included despite what the documentation claims #65

Open lunar-debian opened 8 months ago

lunar-debian commented 8 months ago

In other known limitations, we can read:

To the best of our knowledge, all files contained in the dataset are licensed with one of the permissive licenses (see list in Licensing information) or no license. The accuracy of license attribution is limited by the accuracy of GHArchive and ScanCode Toolkit. Any mistakes should be reported to BigCode Project for review and follow-up as needed.

A search for “under the terms of the GNU General Public License” on StarCoder2 Search returns many entry.

My own GPL code is included, see swh:1:cnt:80ff01eaf4187cf470084f4b0b1fec223691549a as an example. I found this file by searching “diffoscope” in the search engine mentioned above. There is no point in listing a GitHub user, as the code has been embedded in a repository that is not under my control.

Please either state that you include code under the GPL (and other non-permissive licenses) or stop including such code in the dataset.

lunar-debian commented 8 months ago

This can also be confirmed with the StarCoder2 Membership Test that is now online. (The link pointed to a another and down service yesterday.)

The link above searches for:

# diffoscope is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

Which… well… should be easy to detect as GPL. It is an exact copy of the text recommended by the FSF to use the GPL-3.