dmgerman / ninka

a license identification tool for Source Code
http://ninka.turingmachine.org
GNU General Public License v2.0
103 stars 30 forks source link

does not recognize simple/standard AGPL license headers #11

Open zacchiro opened 9 years ago

zacchiro commented 9 years ago

I've been playing with ninka on a couple of AGPL'd applications. For testing purposes, here are the apps I've used with the corresponding tarballs:

  1. GNU mediagoblin (Git snapshot): http://upsilon.cc/~zack/stuff/mediagoblin-snapshot.tar.gz
  2. Debsources (ditto): http://upsilon.cc/~zack/stuff/debsources-snapshot.tar.gz

as a baseline test, I've also used the following archive (which contains code licensed under a mixture of licenses):

  1. python-debian (0.1.25): http://upsilon.cc/~zack/stuff/python-debian.tar.gz

I've used the new excel & sqlite wrappers in my tests.

On archive (3), ninka seems to work as expected, recognizing various licenses. On archive (1) and (2), ninka does not recognize any single AGPL'd file as such, even though the headers in them are fairly explicit and standard, e.g.:

# Debsources is free software: you can redistribute it and/or modify it under
# the terms of the GNU Affero General Public License as published by the Free
# Software Foundation, either version 3 of the License, or (at your option) any
# later version.

It seems that ninka does some AGPL "stuff", as reported in the token dump, but fails to conclude that the file is licensed under AGPL.

The problem seems to be specific to AGPL. To verify that I've done the following experiment. I've removed (with "sed -i") all occurrences of the string "Affero " in a local copy of the Debsources archive, and rerun ninka on the resulting archive. ninka has been immediately able to conclude that most files are licensed under GPL3.

So maybe there is a simple AGPL regexp to be tweaked somewhere?

Many thanks for ninka! Cheers.

dmgerman commented 9 years ago

Well, this is a good question. The license statement you used is not "standard". It uses a colon instead of a semicolon. Change

Debsources is free software:

to

Debsources is free software;

and the license would be recognized.

--dmg

zacchiro commented 9 years ago

It looks like the GPL changed from a semicolon to a colon from version 2 to version 3. Compare GPL3, which reads:

This program is free software: you can redistribute it and/or modify

with GPL2:

This program is free software; you can redistribute it and/or

So I suggest that you support both forms, because every project which is following the GPL3 recommendation to the latter, copy pasting from it, will have the same verbatim text of Debsources and Mediagoblin.

Aside, I'm not entirely convinced this is the only cause of this bug. Otherwise why would removing the word "Affero" (without touching the colon/semicolon) would be enough to have AGPL recognized?

Many thanks for your feedback! Cheers.