geodynamics / cig_tools

Various tools for CIG and CIG projects
0 stars 2 forks source link

Handle files with different copyright dates. #11

Open hlokavarapu opened 7 years ago

hlokavarapu commented 7 years ago

For codes with a longer active life span, it is likely that source code files have different copyright dates. How should we handle such cases, @tjesser-ucdavis-edu ?

jedbrown commented 7 years ago

I'm curious about the context here. Of course some files are older and some are probably not developed by the authors of the package (typically single-file distributions intended to be bundled by the packages that use it).

ljhwang commented 7 years ago

copyright law seems to suggest that it is not necessary to update the copyright on a distribution. can’t follow all of the legalese but: copyright extends to life of author plus 70 years only necessary to update if there has been a substantial change copyright should indicate date of published and yes you can stack as in copyright 2012, 2014 such that no one can establish copyright before you

hence, i would say leave it alone unless you know there has been substantial updates then it should be serially copyrighted. This does create some complexity if we are doing auto update as then you would need to detect and append e.g. “,” or “-“ which i think mean different things.

Best, -Lorraine


Lorraine Hwang, Ph.D. Associate Director, CIG 530.752.3656

On Jan 4, 2017, at 11:06 AM, Jed Brown notifications@github.com wrote:

I'm curious about the context here. Of course some files are older and some are probably not developed by the authors of the package (typically single-file distributions intended to be bundled by the packages that use it).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/geodynamics/cig_tools/issues/11#issuecomment-270457416, or mute the thread https://github.com/notifications/unsubscribe-auth/AESQX1RfDPXJ-M-P4necx_s4S9LCsy17ks5rO-21gaJpZM4LaHun.

jedbrown commented 7 years ago

The Berne Convention says that no copyright notice is necessary -- it is totally superfluous from the perspective of asserting copyright. https://www.copyright.gov/circs/circ03.pdf Having a notice makes it more likely that someone who inadvertently copies code will see the copyright notice. This is relevant if you are trying to claim damages and the defendant claims the infringement was innocent. https://www.law.cornell.edu/uscode/text/17/401

"such that no one can establish copyright before you" -- notices, whether present or not, have nothing to do with this determination.

Modern version control provides vastly more powerful provenance than per-file copyright notices. I personally think per-file notices are obsolete, but many projects still use them, either due to tradition or a paranoia that someone will inadvertently copy code from a file without wondering about the license or copyright status of that code. But those notices serve no legal purpose beyond countering a defense of "innocent infringement" outside fair use. https://en.wikipedia.org/wiki/Copyright_notice#Reasons_to_include_an_optional_copyright_notice

ljhwang commented 7 years ago

correct. so has anyone come up with a succinct statement to summarize the Berne Convention that would be useful to insert somewhere?

I do agree that copyright notice is perhaps more comforting than legally binding. The case of keeping it in the files are the use case in which the file is copied out of the original code and into a different software package. While it may not necessarily be “copyright” some assertion should be there to protect the authors,

Best, -Lorraine


Lorraine Hwang, Ph.D. Associate Director, CIG 530.752.3656

On Jan 4, 2017, at 12:18 PM, Jed Brown notifications@github.com wrote:

The Berne Convention says that no copyright notice is necessary -- it is totally superfluous from the perspective of asserting copyright. https://www.copyright.gov/circs/circ03.pdf https://www.copyright.gov/circs/circ03.pdf Having a notice makes it more likely that someone who inadvertently copies code will see the copyright notice. This is relevant if you are trying to claim damages and the defendant claims the infringement was innocent. https://www.law.cornell.edu/uscode/text/17/401 https://www.law.cornell.edu/uscode/text/17/401 "such that no one can establish copyright before you" -- notices, whether present or not, have nothing to do with this determination.

Modern version control provides vastly more powerful provenance than per-file copyright notices. I personally think per-file notices are obsolete, but many projects still use them, either due to tradition or a paranoia that someone will inadvertently copy code from a file without wondering about the license or copyright status of that code. But those notices serve no legal purpose beyond countering a defense of "innocent infringement" outside fair use. https://en.wikipedia.org/wiki/Copyright_notice#Reasons_to_include_an_optional_copyright_notice https://en.wikipedia.org/wiki/Copyright_notice#Reasons_to_include_an_optional_copyright_notice — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geodynamics/cig_tools/issues/11#issuecomment-270475291, or mute the thread https://github.com/notifications/unsubscribe-auth/AESQXzGvde5TZfm9Xy6bafCygbLT7kBlks5rO_6DgaJpZM4LaHun.

jedbrown commented 7 years ago

Under the Convention, copyrights for creative works are automatically in force upon their creation without being asserted or declared. An author need not "register" or "apply for" a copyright in countries adhering to the Convention. As soon as a work is "fixed", that is, written or recorded on some physical medium, its author is automatically entitled to all copyrights in the work and to any derivative works, unless and until the author explicitly disclaims them or until the copyright expires. https://en.wikipedia.org/wiki/Berne_Convention

I'm not aware of much software that is designed to be copied out on a per-file basis. Usually someone will either need a collection of files or portions thereof, or they will simply be copying out some code fragments. Should a copyright notice also go inside every function?

The reason I personally think these notices are a waste of time is because almost nobody maintains them accurately (it's nigh impossible during code refactoring -- when functions move between files, names get applied to code that they had nothing to do with and code gets moved to files where the authors names are not present) and because they have no legal value beyond prosecuting an "innocent infringement" defense. In particular, such notices don't "protect the authors".

Anyway, I asked about this issue "Handle files with difference copyright dates." What exactly does that mean?

tjesser-ucdavis-edu commented 7 years ago

Usually someone will either need a collection of files or portions thereof, or they will simply be copying out some code fragments. Should a copyright notice also go inside every function?

It is my understanding that most copyleft licenses do require notices be preserved in some form. I believe copying code fragments falls into that category.

The reason I personally think these notices are a waste of time is because almost nobody maintains them accurately (it's nigh impossible during code refactoring...

Automatic maintenance of these notices is the goal of this program, for better or worse. The FSF licenses still recommend having a notice in each code file. On the other hand, the Fedora wiki seems to require only having a copy of the license somewhere in the package.

Anyway, I asked about this issue "Handle files with difference copyright dates." What exactly does that mean?

Sometimes programs contain files with different copyright years, possibly due to the files being created in different years, or maybe one hasn't been edited in a while, etc. The question is whether or not this difference should be preserved. Do these notices have a granularity of individual files or are they updated based on the full project? And if they are based on the full project, how do projects with files under different licenses exist?

Basically, I've seen programs that do have different copyright years in different files, and this issue is about how to implement that if it does become a necessary feature.

jedbrown commented 7 years ago

Some licenses (including the copyleft licenses you're referring to) reiterate existing copyright law, which already requires that copyright information (if present, in whatever form it is present) cannot be removed. https://www.law.cornell.edu/uscode/text/17/1202 This of course has no bearing on the copyright status of the code, only the presence of notices. And people make mistakes all the time and those mistakes are rarely prosecuted.

Note that a license is totally different from copyright. The FSF likes to have the license spelled out in every file because it allows for maximum damages when prosecuting GPL violations. (I.e., it makes "I didn't know it was GPL" extraordinarily hard to believe.) It is not required to have a license or copyright statement in GPL source files. For example, Linux is an extremely IP-savvy GPL project, but does not include a license statement in individual files. Some files have copyright statements, but many do not.

The Fedora statement applies to binary distribution. It's their own way of tracking licensing, but has nothing to do with source distributions.

Copyright notices in individual files are supposed to apply only to the contents of that file. It is not project-wide and should not be updated automatically.

What feature does the tool intend to provide? (Sorry, I have not followed development discussions for this tool.)

tjesser-ucdavis-edu commented 7 years ago

The Fedora statement applies to binary distribution. It's their own way of tracking licensing, but has nothing to do with source distributions.

The first line of the second paragraph is (emphasis mine)

In cases where the upstream has chosen a license that requires that a copy of the license text be distributed along with the binaries and/or source code,

Debian and Fedora both distribute source packages as well as binary packages. I read that page as referring to both package types.


Note that a license is totally different from copyright.

I always understood it as "You are obtaining a license to use a copyrighted work." Is that not correct? If a work is not under copyright, do you still need a license to use it?


What feature does the tool intend to provide?

The idea is to detect and help correct any copyright/licensing issues a project has and then, once corrected, be able to verify that the project is still correct at a later time.

jedbrown commented 7 years ago

Your quote missed the rest of the sentence, which is about correcting mistakes made by the upstream distribution.

In cases where the upstream has chosen a license that requires that a copy of the license text be distributed along with the binaries and/or source code, but does not provide a copy of the license text (in the source tree, or in some rare cases, anywhere), the packager should do their best to point out this confusion to upstream.

You don't need a license for public domain software, though putting software in the public domain requires explicit legal action that is more complicated than licensing. I was replying to your comment about FSF recommending that license information be included in every file (in addition to copyright statements).

What does "correct" mean? If a file contains a copyright statement that claims an author that actually did not actually author anything in the file, is it still correct? If an author of some code contained in the file does not appear in the copyright statement, is it correct? I would argue that automating this is impossible without restricting developers' ability to refactor. (You could do it approximately by mining the Git history, but even that is error-prone.)

What sort of issues do you intend to detect and correct?

tjesser-ucdavis-edu commented 7 years ago

Your quote missed the rest of the sentence, which is about correcting mistakes made by the upstream distribution.

Yes, my point was that Fedora considers it an upstream mistake. You had previously said that the page only applied to binary distribution, and I was trying to say that it was a licensing convention Fedora wanted upstream projects to follow, which are the target users for this tool.


What does "correct" mean?

It's true that "correct licensing" is not well defined, most probably because we haven't consulted a lawyer. Right now, it means finding and following scientific and FOSS community recommendations. I should probably document the recommendations I've been using so far, if I can find them again.

I think trying to protect against lying users is out of scope, however.


Could you edit https://github.com/geodynamics/cig_tools/issues/11#issuecomment-270486795 to remove the duplication? It makes this thread a little harder to reread. Or would you mind if I did? Thanks

jedbrown commented 7 years ago

In Fedora's context, upstream is typically providing a source distribution while the Fedora packager (target of that documentation) is building a binary for Fedora users. The wiki is meant as a guideline for Fedora packagers to create conforming binary packages for use by Fedora users. That one section says what to do when encountering a mistake made by upstream. The mistake is not about breaking a Fedora guideline, but about an internal inconsistency. An example is if the upstream (source) distribution says "This software is GPL" but does not include the license text. For upstream to distribute GPL source without a copy of the GPL License text is itself a violation of the GPL, which requires that recipients get "a copy of this License along with the Program". The quoted statement just says that Fedora packagers should notify upstream to resolve this rather than, for example, filling it in themselves (possibly erroneously; did the copyright holders mean GPLv2, GPLv2+, GPL3, etc., and does it apply to everything or only some parts?).

I am interested in the proposed spec. I think that should be written somewhere before starting development of the tool.

I'm not talking about "lying users", but rather innocent mistakes made because the task of refactoring complex software while preserving accurate copyright attribution is nearly impossible. If the file scope copyright statement is meant to be all that is needed to perform the task, then it is literally impossible. Because Git can trace chunks, it is far more powerful than anything based on file-scope copyright statements.

I will also want to warn about developing a tool for correcting inconsistencies. If the task is at all complex, the tool will inevitably make mistakes. (Expert humans routinely make mistakes for this particular task.) If this tool makes mistakes, it could erroneously assure a user of license compatibility while leaving them open to prosecution.

This scenario may sound far-fetched, but suppose a fragment of GPL code accidentally makes its way into a BSD-licensed library without attribution, but the CIG license/copyright tool labels it BSD by mistake. Then Schlumberger uses that library in a commercial reservoir simulator, reassured by the consistent and precise BSD designation on the software. Sometime down the line, the copyright holder for the GPL code fragment prosecutes Schlumberger for license violation (demanding full source to the reservoir simulator).