Extend option for removing headers

cloudposse-archives / copyright-header

GNU General Public License v3.0

181 stars 61 forks source link

Extend option for removing headers #13

Open tmuras opened 11 years ago

tmuras commented 11 years ago

Would be nice if tool could clean up any header (so the new one can be added then). Cleaning up could be defined as: remove all lines from the top (or after <?php in case of PHP, etc) of the file until first non-comment line.

osterman commented 11 years ago

Thanks for the input! This is a tricky problem that I tried to solve as reliably as possible. The solution you propose did at some point cross my mind, but it's error prone. There are just too many way's that a general solution like this would get tripped up. For example, right after the license may come a description of the source code, which should not get removed.

The solution I arrived at was to allow you to create a new license file that matches the header already in the files (less any comment declarations or leading whitespace). Then use the --remove-path argument along with the --license-file argument passing along the location of the new license file you created. With these arguments, it should recursively remove all existing headers of that kind.

Let me know if this solution does not work for you.

-Erik

tmuras commented 11 years ago

Hi Erik,

Thanks for the prompt response.

I would still implement removal of all comments. My use case is to clean up the files, and then add the new copyright header. Each file had different header and it was a mess. I don't think you have to be perfect here. I would not trust any tool removing lines from my source code anyway. The way to go is to commit your code to VCS, then run your tool and review the result. I would be happy enough if it works as expected 90% of the time.

I was thinking about creating similar tool to yours as part of my project - moosh. Now that I found your utility, I think I would rather integrate with your tool. Thanks for the good work so far.

Tomek

osterman commented 11 years ago

One idea I had was to do an automatic analysis on the first N lines of commented source code across all files, after skipping the funky stuff like #! and PEP encoding lines. The idea would be to identify the common lines which are commented across X% of the files. Then make it possible to do automatic removal of those lines.

-Erik

tmuras commented 11 years ago

That would not help me with the cleanup - my files were all different. I think you are over- thinking this one, you don't have to be perfect each time :-). Simply removing all comments will be good enough.

osterman commented 11 years ago

The other challenge I see with this is that often multiple comment formats are allowed in the various languages. The syntax file only defines how to decorate the the license, but not how all comments could be defined. We would need to add a definition for the various comment formats in each language. I'm not keen on updating the syntax file with all the various formats.

tmuras commented 11 years ago

I think the pre-defined standard // # /* would be good in 99% of cases, maybe with an option to over-ride it.

It's your call anyway, please close it if you don't want it for any reason.