chaoss / grimoirelab

GrimoireLab: platform for software development analytics and insights
https://chaoss.github.io/grimoirelab/
GNU General Public License v3.0
500 stars 183 forks source link

Automate the process of updating the license #301

Open vchrombie opened 4 years ago

vchrombie commented 4 years ago

As of now, we have a manual process of updating the license at least updating the year.

Example: https://github.com/chaoss/grimoirelab-perceval/commit/076953e95735401b4d9266562f9ae406a30751a0

We can automate this as this applies to all the components in the grimoirelab. Maybe a small script that updates the year, which can be executed yearly once.

This can be extended to updating the Authors in the license too.

Inspired by https://github.com/Bitergia/prosoul/issues/13#issuecomment-591940810. :slightly_smiling_face:

vchrombie commented 4 years ago

This issue is open for discussion. Once we decide on what all things we need to automate, then we can proceed on for implementation.

I would like to help with the implementation part too. :slightly_smiling_face:

valeriocos commented 4 years ago

Thank you for opening this issue @vchrombie!

Please could you propose:

We can use your proposal as a baseline for discussions.

vchrombie commented 4 years ago

Hi @valeriocos

  • the metadata information to be updated within a file

  • a possible implementation/approach to update the file metadata information

I have got two implementations for now.

  1. One practical approach is to replace the particular line. To be specific, replace old_text with new_text. You can refer to the gist. Though it kinda weird approach, but it works fine. You can see the changes here https://github.com/chaoss/grimoirelab-perceval/compare/master...vchrombie:license-automation.

  2. There is an existing project specifically this purpose, johann-petrak/licenseheaders. This works really well if you consider just years.

    python3 licenseheaders.py -y 2015-2020 -d perceval/backends

    This would change the copyright year period in all the code files in the backend folder. There is no support for the Authors as of now. But, I think we can have a fork and change the project as required for the chaoss organization.

valeriocos commented 4 years ago

I am not sure about the Author field, as I have a doubt. My question is how do you define author over here? Is it like only the person who created the file in the starting or something like a contributor to that file? perceval/init.py#L18

I would go for a simple and common definition: An author is anyone that at some point has edited/authored the file. For instance, in the case of perceval/init.py, we should have 3 authors (as pointed by the GitHub UI).

I have got two implementations for now.

Why solution 1. is weird? If I understand the approach correctly, it looks for some text in a given file and replace it, right?

Solution 2. seems to be too much for what we are looking for (that tool uses templates and focuses on licences), however we can have a look at it as a source of ideas.

There is no support for the Authors as of now

Maybe we could use the Git backend of Perceval to get the commits of a repository and then extract the authors of the commits in a given file. Another option could be to use the GitHub commits API to get the same information. WDYT?

vchrombie commented 4 years ago

I would go for a simple and common definition: An author is anyone that at some point has edited/authored the file. For instance, in the case of perceval/init.py, we should have 3 authors (as pointed by the GitHub UI).

Okay.

Why solution 1. is weird? If I understand the approach correctly, it looks for some text in a given file and replace it, right?

The approach involves making a temporary file, writing all contents of the source file in the temporary file and substituting the string and at the last replacing the source file with the temporary file. I just felt weird because it is a lot of process, nothing else. :sweat_smile:

Solution 2. seems to be too much for what we are looking for (that tool uses templates and focuses on licences), however we can have a look at it as a source of ideas.

Yes, exactly.

Maybe we could use the Git backend of Perceval to get the commits of a repository and then extract the authors of the commits in a given file. Another option could be to use the GitHub commits API to get the same information. WDYT?

This seems to be a perfect idea. :smiley:

valeriocos commented 4 years ago

Thank you for the quick reply!

The approach involves making a temporary file, writing all contents of the source file in the temporary file and substituting the string and at the last replacing the source file with the temporary file. I just felt weird because it is a lot of process, nothing else. sweat_smile

I see :) it's a PoC, it can be improved in the next iteration

vchrombie commented 4 years ago

Hi @valeriocos

Can you please check this repository, vchrombie/grimoirelab-scripts when you have time. I have written a script which fetches the authors names and updates that with the copyright information using a template.

This is the result when I executed the script for the backend.py.

# Copyright (c) 2015-2020 Bitergia
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
# Authors:
#     animesh <animuz111@gmail.com>
#     Valerio Cosentino <valcos@bitergia.com>
#     JJMerchante <jj.merchante@gmail.com>
#     Santiago Dueñas <sduenas@bitergia.com>
#     Harshal Mittal <harshalmittal4@gmail.com>
#     Jesus M. Gonzalez-Barahona <jgb@gsyc.es>

As of now, most of them are hard-coded. But, it can be improved by having some iteration through the files using the os module maybe.

Also, the next step here is to remove the initial content (existing copyright information in the file) and add the new content (generated one). This should not affect much as git shows only additions and deletions, but not how you did.

vchrombie commented 4 years ago

Hi @valeriocos

A small update.

I completed the script, at least it works now, but can be improved more. I tried updating the source code files too and I sent a draft PR testing the script, https://github.com/chaoss/grimoirelab-perceval/pull/623.