User Names in git - Githubissues

hulpke commented 5 years ago

I noted that we have from time to time new people coming in (not just externally, but joining the gap-system setup), whose user names are random (i.e. do not give an indication of the person) and whose profile page does not give any information either.

I might be an old-fashioned fuddy-duddy, but I would find it helpful if contributors would use their name or have at least some information in the profile that identifies them.

Doing so would help in understanding the contexct of a contibution, how to respond, and is also the standard of scientific journals.

wucas commented 5 years ago

Generally, I'd argue for the right to use the web under a pseudonym or be even anonymous. But I do understand your point about being able to better understand the context of a contribution. Although, currently I, as a newcomer myself, would not have been able to really infer who e.g you, Max, Chris etc. are only from your GitHub accounts. I think, for a community as this is, it would be reasonable to kindly ask people for some context of who they are, but I think I wouldn't be a good idea to try to enforce this in any way or form, besides from making this somewhat of a norm by doing this ourselves. But in the end, what should matter imho is good ideas and good contributions. After all, isn't it one of the benefits of developing a project open source that anyone can contribute, maybe even only once, without first having to become a part of some social group?

tldr: I think it's okay to ask for it but not to require it.

ChrisJefferson commented 5 years ago

I don't think giving a name is becoming "part of a social group". People write serious academic papers which rely on GAP, so I think it's reasonable to know who wrote some piece of code.

Also, people have to officially give us permission to use their code under the GPL. I believe (IANAL, but once saw a talk by one), that it makes things tricky legally if we don't officially know who gave us some code.

fingolfin commented 5 years ago

Also, people have to officially give us permission to use their code under the GPL. I believe (IANAL, but once saw a talk by one), that it makes things tricky legally if we don't officially know who gave us some code.

IANAL either, and I do know that these things can vary wildly between Germany, USA, UK, ...; but I did read a lot about these things, partially because I actually was involved in legal proceedings against a big international company for violating the GPL and using code I wrote in a commercially distributed product; this involved hiring a lawyer who specialized on these things. Still, IANAL, I merely know a bit more than usual which is just enough to make me realize how little I know...

All in all, I don't believe that knowing a person's "official name" matters one bit (and anyway, a name given on GitHub is not official -- I could change my account to "John Doe" at any time). Also, what do you even mean with "officially give us permission"? There are two possibilities that come to mind, from my limited legal knowledge:

Explicit permission: To be legally binding, this has to be in written (physical form). To the best of my knowledge, we have no such things from anybody.
Implicit permission: By contributing to this repository resp. GAP in general, both of which are very clearly and explicitly marked as being under the "GPL 2 or later" (this is one of several reasons why having a legal header in every source file can be valuable!), contributors have implicitly agreed to provide their contributions under the same license (and that they are legally allowed to make that contribution, etc., etc.). This is the case no matter what the identity of the contributor is. Still, this is certainly weaker than 1. It also means that there isn't a single entity to which copyright in GAP belongs, which can make legal proceedings to "defend our IP", should those ever become necessary, much harder. This is why e.g. the FSF requires people to sign forms were they "assign copyright" to their contributions to the FSF (a process which is possible in e.g. US jurisdiction, but not so in e.g. German jurisdiction, were it is, to my limited understanding at least, apparently essentially impossible to transfer personal copyright for something one created).

In either case, the argument about "giving permission" seems to be irrelevant for the discussion at hand.

All in all, while I acknowledge the need some people seem to have to "know" (?) who made a pull request, where "who" seems to refer to a full name plus some background info (e.g. "student at uni X", "prof at institute Y", I do not understand this need. However, I would like to -- perhaps there could be some more elaboration on this? I do understand it for e.g. Slack, because there, we are having discussions on lots of things, and I just feel better having a rough idea "who is in the room".

But for PRs? Shouldn't we judge PRs by their content, just like we should referee papers based on their content? And not approve or reject them based on whether we know the author, or how we judge the reputation of the that author? In fact, in peer review, a lot of people think that it might be better if reviews were conducted "double blind"...

As it is, of course I am influenced by who is author of a PR, but I am not sure that is necessarily good...

@hulpke wrote:

I might be an old-fashioned fuddy-duddy, but I would find it helpful if contributors would use their name or have at least some information in the profile that identifies them.

Doing so would help in understanding the contexct of a contibution, how to respond, and is also the standard of scientific journals.

Here are some honest open questions:

In how far would knowing the name of a contributor alone be helpful (how is "John Smith" better than "R4ndomUser"?
Assuming what really was meant is "knowing the 'identity' of a user, at least: their name, their university, their status group, their area of research": How would that help
Why does the "context" of a contribution matter? What does that even mean?

hulpke commented 5 years ago

Dear @fingolfin In short, I think my reason for wanting the name on github is the same as you wanting it on Slack -- we are discussing the system development and I would like to know who is involved.

As for your questions:

A proper name lets you search "John Smith Math" or try MathSciNet, Zentralblatt, etc. and find the persons professional information, past publications, essentially the items you list under the second point. (I would not want to require everyone to list all of this in a place such as github). If I want to have further information about the ideas underlying an algorithm, this migth help me to locate appropriate information.

If I search instead for an arbitrary online handle I find -- maybe -- contributions to some non-mathematical software, to reviews of purchases, discussions of movies and other aspects of an online persona that have little to do with mathematical software development.

Why would it help (besides making me feel more comfortable that I "know" the person)? If I know that the person has written publications about -- say -- p-groups (or is working on a thesis with an advisor who is working on p-groups) I put far more trust in proposed methods (or heuristics) for p-groups, and I know that it is plausible to respond with comments that are incomprehensible if one does not know basic theory. If the code makes choices that seem strange to me (but I don;t know the area well) this is more likely a problem of me than of the author. If it is code I would like to use in critical situations, the fact that the author has a long-term academic position gives me trust that she will be around in the future is I have questions.

Vice versa, if -- say -- user "hulpke" submits code for solving partial differential equations (an area where the user has never worked in or would be expected to have working knowledge of), it might be plausible to assume that this is basic, ad-hoc functionality that is far from the state-of-the-art in the area. This does not doom it to uselesness, but I should not assume that I might not be able to rely on it as particularly efficient code for general problems.

Yes, I tend to base decision much more on "traditional reputation" in the real world, and that's why I like to have the real world link. This saves me a lot of time in going through the details of proposed code. Doing so might make me overlook a Ramanujan of math software. I think the risk of this is very small (and indeed much smaller than the equivalent risk at Hardy's time).

olexandr-konovalov commented 5 years ago

Just to add some brief thoughts. I think that:

While it's good to to know who is who, that should not be mandatory.
I remember seeing one of F1000 papers where all but one authors had actual names, and one only had the GitHub username in the list of authors!
Even when the username in non-obvious, and the github profile is not informative, there is a a signed commit message with name and email (which stays in our revisions history). Those may also be non-informative though shrug
IANAL, but I think that those who submitted PRs via GitHub already officially gave us permission to use their code under the GPL (perhaps governed by our license and GitHub T&C).
Finally, the code should be judged on its quality: same review standards should be applied for all contributions. Anyone can benefit from the feedback in their code, and everyone could make mistakes, in code, in papers, in running experiments.

dimpase commented 5 years ago

I know it sounds paranoid, but it's close to a dangerous territory where malicious contributions from anonymous submitters might get through---it's obviously much harder to do a malicious contribution under the real name :-).

Paranoid or not, nevertheless, such things are known to happen in related to computer algebra areas of crypto etc, and there are attempts to design cryptosystems based on group theory - so yes, one might be tempted to break GAP if there were a flawed crypto which could be broken with GAP...

olexandr-konovalov commented 5 years ago

That only shows that testing and code review practices should be sound to avoid any broken crypto, whenever intentional or not...

fingolfin commented 5 years ago

I know it sounds paranoid, but it's close to a dangerous territory where malicious contributions from anonymous submitters might get through---it's obviously much harder to do a malicious contribution under the real name :-).

No it isn't, because there is zero validation. I can right now create a new GitHub account with all public visible information identical to e.g. yours or anybody elses, and then make a contribution from there.

And then there is always the possibility of hacking somebodies account, and then making a contribution from that. Possibly without the victim even noticing.

Thus if we were seriously worried about this (I am not) the first thing we should do is request all contributors to enable 2-factor authentication on their GitHub account (which I'd welcome anyway, but right now, just 8 out of 31 members of gap-system have done so). Also, we'd have to carefully vet every single account making contributions.

I'd much rather carefully review contributions than sending private investigators after volunteers who sacrifice their valuable time to help us; i.e., I'd rather optimize for the 99.9999% case, instead of worrying about the 0.0001% probability that somebody wants to insert a backdoor into GAP for nefarious purposes.

Paranoid or not, nevertheless, such things are known to happen in related to computer algebra areas of crypto etc, and there are attempts to design cryptosystems based on group theory - so yes, one might be tempted to break GAP if there were a flawed crypto which could be broken with GAP...

Oh come on, that's a serious stretch! I could equally argue that each of us is a target for abduction by terrorists: because we work on GAP, which is math; math is similar to physics; some physicists work on atomic bombs; terrorist want atomic bombs.

BTW, all papers I've seen so far that purport to use non-abelian groups to somehow derive some new kinds of crypto system (bonus points for adding buzzwords like "post-quantum") really are completely useless for all practical purposes, and really will be (I have 2-3 people in mind who regularly crank out new such papers, and these papers are IMHO all worthless).

gap-system / gap

User Names in git #3622