boostorg / website-v2

New Boost website
https://boost.io
Boost Software License 1.0
9 stars 13 forks source link

author in library metadata #1325

Open vinniefalco opened 5 days ago

vinniefalco commented 5 days ago

meta/libraries.json should also have an "author" field to distinguish the author(s) of the library from the maintainers (which can become different over time).

vinniefalco commented 5 days ago

One problem is that "author" will not exist for older releases. The logic to extract the author should assume that older releases have the same author as the oldest release with an author metadata field.

pdimov commented 5 days ago

It already has authors. https://www.boost.org/development/library_metadata.html

vinniefalco commented 5 days ago

I see, well that's great. However there are missing emails: https://github.com/boostorg/beast/blob/develop/meta/libraries.json#L5 https://github.com/boostorg/mp11/blob/develop/meta/libraries.json#L5

Also some emails are not normalized: https://github.com/boostorg/mp11/blob/develop/meta/libraries.json#L8

pdimov commented 5 days ago

The authors don't have e-mails.

What does "not normalized" mean?

vinniefalco commented 5 days ago

Not normalized: "pdimov -at- gmail.com"

The python code needs to detect this and change it to "pdimov@gmail.com" in the database, or else we can't tie it to a GitHub user.

It would be best if the authors field has emails, so we can tie it to a GitHub user or website user. We should include this guidance in the Contributor's Guide. We can't go back and change old release but we can add them for future ones.

pdimov commented 5 days ago

What definition of "normalized" are you using?

vinniefalco commented 5 days ago

"bring or return to a normal or standard condition or state"

In this case "normalized" means

{name} '<' {email} '>'

where "email" is a valid email address

pdimov commented 5 days ago

Well, that's not what libraries.json contains. Look at the link above, which describes it. This is the format originally used in maintainers.txt, which has been retained in libraries.json.

vinniefalco commented 5 days ago

Yes I see now. Well, how do you suggest we add the author email to the metadata?

pdimov commented 5 days ago

The e-mails are supposed to be contact information, which is why only the maintainers have them (as they need to be contactable.)

Most of authors predate the creation of Github so even if they did have e-mails they wouldn't really be useful.

If we want to tie authors and maintainers to Github accounts it would probably be better if we add the Github account directly in libraries.json, somehow. E.g. authors: "Peter Dimov <https://github.com/pdimov>", maintainers: "Peter Dimov <pdimov -at- gmail.com> <https://github.com/pdimov>" or something similar.

pdimov commented 5 days ago

Or I suppose we can make the authors and maintainers proper JSON objects. { "name": "Peter Dimov", "email": "pdimov -at- gmail.com", "github_id": "pdimov" }

vinniefalco commented 5 days ago

Most of authors predate the creation of Github

I say GitHub but what I really mean is git. The website looks at the commit metadata of the repository, without using the GitHub API. Later we use the email to tie to a GitHub user. Presumably, we will move off GitHub far sooner than we move off git, so this technique is somewhat future-proof.

We can get by with just knowing the author email. Since all existing author fields in libraries.json does not have an email, we still have to write the code to try to guess the email from other available data, and this might get us by for now.

pdimov commented 5 days ago

Many of authors predate git as well. :-)

There's no guarantee that the e-mail there, even if it existed, would have been the same e-mail as the one in the commits.

vinniefalco commented 5 days ago

The commit report has a system for tracking aliases, so that a person can be associated with multiple email addresses as they are discovered