dod38fr / config-model

Perl module to create configuration editor with semantic validation
58 stars 12 forks source link

How are dpkg copyright entries merged? #24

Closed rpavlik closed 4 years ago

rpavlik commented 4 years ago

The context is me trying to run cme update dpkg-copyright on this meshlab package: https://salsa.debian.org/rpavlik-guest/meshlab I've manually merged a bunch of these entries, since (after the fix-ups) most of the source is 2003-2020 by "Visual Computing Lab, ISTI - Italian National Research Council". Whenever I run cme to update these, it tends to split out those entries into lots of individual ones with just a few files in each.

I have seen it merge entries on other packages, however, hence my curiousity/issue.

(Please let me know if this is the wrong repo to hold this discussion.)

dod38fr commented 4 years ago

The merge mechanism is:

In your case, the copyrights from vcglib directory often have different copyright years so they are not merged together.

To make the situation more confusion, these copyrights are not correctly parsed by licensecheck. For instance vcglib/wrap/system/memory_info.h copyright:

* Copyright(C) 2004-2016                                           \/)\/    *
* Visual Computing Lab                                            /\/|      *
* ISTI - Italian National Research Council                           |      *

is parsed as :

 licensecheck -m --copyright  vcglib/wrap/system/memory_info.h
vcglib/wrap/system/memory_info.h        GPL (v2 or later)       (C) 2004-2016 /)/ *

which leads to this entry in debian/copyright:

Files: vcglib/wrap/system/memory_info.h
Copyright: (C) 2004-2016, /)
License: GPL-2+

Hope this helps

rpavlik commented 4 years ago

Yeah, I have a fair amount of stuff in the fix.scanned.copyright to deal with that format, as well as to normalize the formatting of the copyright holder's name (turns out ascii art in your copyright notice doesn't always remain intact over 17 years of development...).

That makes sense if it's only merging things that already have the same years - I expected it would merge "2002" and "2002-2004" from the same entity into a single "2002-2004". That explains the behavior I'm seeing, thanks for your help!

dod38fr commented 4 years ago

I expected it would merge "2002" and "2002-2004" from the same entity into a single

cme should merge "2002" and "2002-2004" years if they are found in one file. No such merge occurs accross files.

That said, merging is done before applying the instructions from fix.scanned.copyright.

Usually, I would suggest to override the data coming from badly parsed file in fill.copyright.blanks.yml, but this is not practical given the number of files that need to be overridden.

May be you could talk with licensecheck author. He may be able to tweak licensecheck to better cope with vcglib files.

Hope this helps

rpavlik commented 4 years ago

thanks!