canonical / sphinx-docs-starter-pack

A documentation starter-pack
https://canonical-starter-pack.readthedocs-hosted.com/
Other
15 stars 38 forks source link

Automatically count, link to Contributors enhancement #203

Closed Unique-Usman closed 5 months ago

Unique-Usman commented 6 months ago

@ru-fu I just created a pull request for this issue #155, I just want to make this first pull request and get feedback on it as I know there are some improvement that can be make.

ru-fu commented 6 months ago

Thanks @Unique-Usman ! This looks great at first glance. I'll review and add some suggestions later today. :slightly_smiling_face:

ru-fu commented 6 months ago

This works great, and is a really good starting point for what we need. Thanks so much!

There are a few things that we need to change so it actually fits nicely into the starter pack, and a few improvements that will make it even more helpful. This might look like a lot, but I think it shouldn't be very difficult to do. I would break it down into the following 4 changes, and work on them one by one in order (feel free to submit the changes one by one as well, that might make it easier to get things working!):

  1. Move the changes from custom_conf.py to conf.py.
  2. Single-source the username and repository information.
  3. Get the contributors for each file instead of for the full repository.
  4. Display the list as part of the page instead of the header.

I'll add some comments for each of these 4 changes to explain what I mean with them. :)

Unique-Usman commented 6 months ago

Wow, thank you very much @ru-fu for the review. I will work on it and drop any question whenever I get stuck.

Unique-Usman commented 6 months ago

@ru-fu I completed the first three part. It remained the last part.

Unique-Usman commented 6 months ago

Thank you for the review @ru-fu, I made the requested change.

Unique-Usman commented 6 months ago

Yeah, it really make sense. I will try that.

Unique-Usman commented 6 months ago

@ru-fu I moved the contributors to the footer and also display it in the center of the page.

ru-fu commented 6 months ago

The spelling check is currently failing, because it tries to check the spelling of the contributors' names. We don't want those to be checked of course. To exclude them from the spelling check, can you edit https://github.com/canonical/sphinx-docs-starter-pack/blob/6a0cc6d9e01f0f755e79b38d3fe7d2d5b4ffcb41/.sphinx/spellingcheck.yaml#L29 and add another line to the ignores list:

      - a.contributor
ru-fu commented 6 months ago

There's another problem though ... (this is really turning into a typical programming task that just gets harder and harder, sorry for that!)

Before merging this change, I tested it out in a bigger repository that has both more contributors and (many!) more files. And that resulted in a lot of rate limit errors:

Failed to fetch contributors for file: 403 Client Error: rate limit exceeded for url:

Which makes sense - we're contacting the GitHub API for every file, and with a rate limit of 60 unauthenticated requests per hour, and more than 60 files, this won't work. :(

I'm not quite sure what is the best solution to this problem ... Some ideas:

Any thoughts? I would probably go for the first option of using the local Git log first, and then we can worry about the user profile later. But I'm open for any other ideas!

evildmp commented 6 months ago

I would probably go for the first option of using the local Git log first, and then we can worry about the user profile later. But I'm open for any other ideas!

That sounds like the right way forward, to do the simplest thing first.

Unique-Usman commented 6 months ago

@rufu, I think the first solution seems approachable. Also, If we could get the username of the commiter from the local git file, we should be able to contruct the github link of such commiter.

rkratky commented 6 months ago

@rufu, I think the first solution seems approachable. Also, If we could get the username of the commiter from the local git file, we should be able to contruct the github link of such commiter.

Unfortunately, that would not work because Git only stores the name and email address -- not any user names (Git itself has no notion of GitHub, GitLab, etc.). We could query the GH API to get a user name from an email address, which would probably result in fewer requests than the current approach (we would gather all email addresses for all relevant files and only then ask GH for user names), but it would still require the use of the API, so there's no guarantee we wouldn't hit the limit on a repo with lots of history and contributors.

Unique-Usman commented 6 months ago

@rufu, I think the first solution seems approachable. Also, If we could get the username of the commiter from the local git file, we should be able to contruct the github link of such commiter.

Unfortunately, that would not work because Git only stores the name and email address -- not any user names (Git itself has no notion of GitHub, GitLab, etc.). We could query the GH API to get a user name from an email address, which would probably result in fewer requests than the current approach (we would gather all email addresses for all relevant files and only then ask GH for user names), but it would still require the use of the API, so there's no guarantee we wouldn't hit the limit on a repo with lots of history and contributors.

We could start with implementing the user name and their email for now ?

rkratky commented 6 months ago

We could start with implementing the user name and their email for now ?

Yes, I agree that's best for now. But I'd vote for having just the name. Exposing email addresses might not be welcome by sopme people.

ru-fu commented 6 months ago

We could query the GH API to get a user name from an email address, which would probably result in fewer requests than the current approach (we would gather all email addresses for all relevant files and only then ask GH for user names), but it would still require the use of the API, so there's no guarantee we wouldn't hit the limit on a repo with lots of history and contributors.

Yes, I agree. But here we could probably gain a bit with caching, since most contributors will have several commits. Still, for big repos we'll likely have more than 60 different contributors, so we'd need to cache the information in a file that is checked in - which is also going to be a hassle.

Another idea I had is to link to the latest commit for each contributor. We have the commit hash locally, and we can use this to construct a link to the commit on GitHub. Of course this might mean that we link to a commit that fixes one typo instead of any of the earlier big commits by the contributor, but it's better than no link imo?

We could start with implementing the user name and their email for now ?

Yes, I agree that's best for now. But I'd vote for having just the name. Exposing email addresses might not be welcome by sopme people.

:+1: That's exactly what I wanted to write.

I think we should use the format that you already implemented, @Unique-Usman , but without the links for now (or with links to the latest commit for each contributor, as mentioned above). And update the function that retrieves the list of contributors to use the local Git history instead of the GitHub API.

ru-fu commented 6 months ago

Btw, I'll be away for the next 10 days and probably won't have time to respond here. But I'll review again as soon as I'm back (April 8), and maybe some of the others can chime in here before that if needed. :)

rkratky commented 6 months ago

Another idea I had is to link to the latest commit for each contributor. We have the commit hash locally, and we can use this to construct a link to the commit on GitHub.

I think that's a great idea. Linking to the latest commit is both interactive and simple. It allows people to click through to GH (where they can further click to see the contributor's whole history, can contact them, etc.) & it's cheap in terms of the possibility to contruct the link from regular Git history (no need to query the GH API).

The only potentially confusing thing would be deciding which latest commit:

The first option could be an ancient commit, which may be a bit weird, and the second option could take the user to a completely different file in the GH view, which may, again, be a bit strange.

In both cases, it could be somewhat misleading that the user would be viewing some docs, but the link could potentially take them to very different content (either an old version of the page, or a completely different part of the docs set).

I'd say the second option is better, but I don't have a strong opinion on that.

@Unique-Usman, if you'd be up to modifying the code to show the name as a link to the contributor's latest commit, that would be fantastic.

ru-fu commented 5 months ago

I'd say the second option is better, but I don't have a strong opinion on that.

I would prefer to go for the first option, thus the contributor's latest commit for the current file. Mainly because it is easier to implement (we already have the commit history for that file, so we don't need to search for other commits), but also because the list of contributors will be based on the current file, so linking to potentially unrelated changes seems confusing.

@Unique-Usman Are you comfortable implementing that change? Or would you like some pointers on how to start?

Unique-Usman commented 5 months ago

Hi @ru-fu, welcome back. Some pointers will be really helpful, thank you.

ru-fu commented 5 months ago

Sure @Unique-Usman ! :slightly_smiling_face:

From what I can see, you'll need to update only the get_contributors_for_file function. That one is currently getting a commit list from the GitHub API, which is the part that causes performance issues.

So instead, you can use GitPython to interact directly with the current GitHub repo (which you have locally on your disk already when building, so you don't need to clone it again). So you can create a repository object for the current folder (repo = Repo(".")), and then you can get all the commits for a specific file by following this answer.

You should then have a list of commit objects that contain the SHA and the author of the commit - and that should be all you need! For each contributor, store the SHA of their newest commit and then link to https://<repository_url>/commit/<commit_SHA>.

Does that help to get started?

Unique-Usman commented 5 months ago

Sure @Unique-Usman ! 🙂

From what I can see, you'll need to update only the get_contributors_for_file function. That one is currently getting a commit list from the GitHub API, which is the part that causes performance issues.

So instead, you can use GitPython to interact directly with the current GitHub repo (which you have locally on your disk already when building, so you don't need to clone it again). So you can create a repository object for the current folder (repo = Repo(".")), and then you can get all the commits for a specific file by following this answer.

You should then have a list of commit objects that contain the SHA and the author of the commit - and that should be all you need! For each contributor, store the SHA of their newest commit and then link to https://<repository_url>/commit/<commit_SHA>.

Does that help to get started?

Thanks @ru-fu, it should be enough. I will check it out and get back if any question.

Unique-Usman commented 5 months ago

@ru-fu , kindly help check it out. Thanks for your guidance.

Unique-Usman commented 5 months ago

Thanks for the review @ru-fu.

ru-fu commented 5 months ago

Thanks! :rocket: I'll wait a bit with merging to see if others have any comments.

k-dimple commented 5 months ago

Could we move the "Thanks to ..." element to the middle of the footer in between the two blocks of links? That might look better.

Unique-Usman commented 5 months ago

Could we move the "Thanks to ..." element to the middle of the footer in between the two blocks of links? That might look better.

@k-dimple, thanks for the comment, I made the necessary changes.

ru-fu commented 5 months ago

There is now a Git conflict in one of the files (after we merged https://github.com/canonical/sphinx-docs-starter-pack/pull/214 ). Do you feel comfortable resolving that @Unique-Usman ? Otherwise I can give it a try.

Unique-Usman commented 5 months ago

There is now a Git conflict in one of the files (after we merged #214 ). Do you feel comfortable resolving that @Unique-Usman ? Otherwise I can give it a try.

I will give it a try.

ru-fu commented 5 months ago

Seems there aren't any further comments, so I'll merge this now. Let's see how it works in practice!

Thanks a lot @Unique-Usman ! :+1:

Unique-Usman commented 5 months ago

@ru-fu thanks for your time and support during the whole process. I really appreciate it.