Open vinniefalco opened 5 days ago
One problem is that "author" will not exist for older releases. The logic to extract the author should assume that older releases have the same author as the oldest release with an author metadata field.
It already has authors
. https://www.boost.org/development/library_metadata.html
I see, well that's great. However there are missing emails: https://github.com/boostorg/beast/blob/develop/meta/libraries.json#L5 https://github.com/boostorg/mp11/blob/develop/meta/libraries.json#L5
Also some emails are not normalized: https://github.com/boostorg/mp11/blob/develop/meta/libraries.json#L8
The authors
don't have e-mails.
What does "not normalized" mean?
Not normalized: "pdimov -at- gmail.com"
The python code needs to detect this and change it to "pdimov@gmail.com" in the database, or else we can't tie it to a GitHub user.
It would be best if the authors field has emails, so we can tie it to a GitHub user or website user. We should include this guidance in the Contributor's Guide. We can't go back and change old release but we can add them for future ones.
What definition of "normalized" are you using?
"bring or return to a normal or standard condition or state"
In this case "normalized" means
{name} '<' {email} '>'
where "email" is a valid email address
Well, that's not what libraries.json
contains. Look at the link above, which describes it. This is the format originally used in maintainers.txt
, which has been retained in libraries.json
.
Yes I see now. Well, how do you suggest we add the author email to the metadata?
The e-mails are supposed to be contact information, which is why only the maintainers have them (as they need to be contactable.)
Most of authors
predate the creation of Github so even if they did have e-mails they wouldn't really be useful.
If we want to tie authors
and maintainers
to Github accounts it would probably be better if we add the Github account directly in libraries.json, somehow. E.g. authors: "Peter Dimov <https://github.com/pdimov>"
, maintainers: "Peter Dimov <pdimov -at- gmail.com> <https://github.com/pdimov>"
or something similar.
Or I suppose we can make the authors and maintainers proper JSON objects. { "name": "Peter Dimov", "email": "pdimov -at- gmail.com", "github_id": "pdimov" }
Most of authors predate the creation of Github
I say GitHub but what I really mean is git. The website looks at the commit metadata of the repository, without using the GitHub API. Later we use the email to tie to a GitHub user. Presumably, we will move off GitHub far sooner than we move off git, so this technique is somewhat future-proof.
We can get by with just knowing the author email. Since all existing author fields in libraries.json does not have an email, we still have to write the code to try to guess the email from other available data, and this might get us by for now.
Many of authors
predate git
as well. :-)
There's no guarantee that the e-mail there, even if it existed, would have been the same e-mail as the one in the commits.
The commit report has a system for tracking aliases, so that a person can be associated with multiple email addresses as they are discovered
meta/libraries.json should also have an "author" field to distinguish the author(s) of the library from the maintainers (which can become different over time).