anitab-org / anitab-forms-backend

AnitaB Forms (backend), an application to process smaller 4-week or full day programs to contribute to open source projects. Similar to GSoC, Outreachy, or RGSoC. This is the backend app.
GNU General Public License v3.0
18 stars 58 forks source link

Feat : Github stats of a user #60

Closed chinmaym07 closed 3 years ago

chinmaym07 commented 3 years ago

Description

As an Admin, I should be able to view user's Github stats whose application is being reviewed.

Acceptance Criteria

The code handles retrieving:

Definition of Done

Estimation

2 days

codesankalp commented 3 years ago

I would like to work on this issue @isabelcosta .

isabelcosta commented 3 years ago

Sure, but you would first have to finish the tests one :) @codesankalp are you finished with it? If so can you make sure you get the 2 approvals on it + test so I can merge? @codesankalp

NenadPantelic commented 3 years ago

@isabelcosta Can I take this one?

codesankalp commented 3 years ago

@isabelcosta If possible I want to work on this issue as my other PR's need review.

isabelcosta commented 3 years ago

Sure @NenadPantelic I can assign it to you. @codesankalp we try to have people assigned to one issue at a time, and since @NenadPantelic has not contributed yet to this repo, this would be a good opportunity to do so.

Amulya-coder commented 3 years ago

@NenadPantelic please give the status of your work.

NenadPantelic commented 3 years ago

Hi, I have a few questions regarding the direction of this issue: 1) do we want to store the GH stats in the database? Generally, I think this is a good idea because we don't want to fetch the data from the GH API whenever we're accessing this resource in the app, that could be a performance suicide. On the other side, we would not have fresh data if we use the data we stored previously in our database. I would suggest storing the stats in our database and then implement an endpoint that will re-fetch the data from the API and update the data in our DB. By default, the data will be shown from the database but could be refreshed from time-to-time (on click ofc). 2) Maybe to clarify a bit the use case of Total Pull Request to a specific project - should we list all the projects by that user or specific one (if we choose that - how to search that specific project, that could require new tables in the DB)? One solution would be:

{
  "projects_stats": {
    "prs_by_project": {
      "project_name_1": 0,
      "project_name_2": 0,
      "project_name_3": 0
    }
  }
}

@isabelcosta @chinmaym07

NenadPantelic commented 3 years ago

@Amulya-coder I implemented a part of it, but I think we must discuss a few things (the comment above).

isabelcosta commented 3 years ago

@codesankalp @Aaishpra could you help here? For a first MVP, I think getting data directly from GitHub would be better 🤔 What do you think?

NenadPantelic commented 3 years ago

@isabelcosta Just to clarify, we will definitely fetch the data from the GitHub API, but I suggest storing that response in the database permanently and add on the frontend a button that will trigger the endpoint for updating that data (fetching again the data from the GitHub API). This is one option (I think it's better) and the other one is to fetch the data whenever the user accesses the page, so client sends the request -> OSP API -> GitHub API, we need some kind of caching in that scenario. Of course, for the MVP, we can go with the constant fetching.

chinmaym07 commented 3 years ago

Yes I agree & that's great @NenadPantelic to store the stats in database & I think we should display all the PR's that are made on different projects of an organisation.

codesankalp commented 3 years ago
  1. do we want to store the GH stats in the database? Generally, I think this is a good idea because we don't want to fetch the data from the GH API whenever we're accessing this resource in the app, that could be a performance suicide. On the other side, we would not have fresh data if we use the data we stored previously in our database. I would suggest storing the stats in our database and then implement an endpoint that will re-fetch the data from the API and update the data in our DB. By default, the data will be shown from the database but could be refreshed from time-to-time (on click ofc).

The idea is to manage all third-party APIs from Backend API, so I would also suggest using REST API to store data in the database and refreshing it with time when a user clicks on a button in the frontend (like the implementation of Zulip API).

  1. Maybe to clarify a bit the use case of Total Pull Request to a specific project - should we list all the projects by that user or specific one (if we choose that - how to search that specific project, that could require new tables in the DB)? One solution would be:
{
  "projects_stats": {
    "prs_by_project": {
      "project_name_1": 0,
      "project_name_2": 0,
      "project_name_3": 0
    }
  }
}

According to MVP, you have to only store these data:

  1. Username
  2. PRs:
  3. Open:
  4. Closed:
  5. Merged:
  6. PR Reviews
  7. Issues
  8. Comments

So, according to this for the initial stage, we don't need to store pull requests by the project. @isabelcosta @NenadPantelic

Also, @isabelcosta getting data directly from Github will cause the problem or makes things complicated when an admin wants to see a student profile. So storing and retrieving it from the database will be a good option.

Aaishpra commented 3 years ago

I agree with @codesankalp we should store the GH stats in the database, and refresh them when a user clicks on the refresh button(on frontend) In this way we can also keep a backup of the last refreshed data (considering the scenario if the Github scripts fail) and maintain similarity to Zulip API implementation.

For the MVP we need to focus on very few things (no need to consider project-wise PRs) attaching a screenshot found from Bismita Timeline - Deliverables of how GH stats are supposed to be image

Though great suggestion to display project-wise PRs, we can focus on it as a future feature when we are done completing MVP

Aaishpra commented 3 years ago

@chinmaym07 can you edit the issue description and remove the following points from it

codesankalp commented 3 years ago

@NenadPantelic Any updates on this issue?

NenadPantelic commented 3 years ago

I have tried a few things: 1) call to GitHub API 2) PyGitHub 3) https://github-readme-stats.vercel.app/api There are pros and cons of every one of them, e.g. by default only default branches are included in commits stats, commits of forked repos are not included etc., so I researched a bit the best approach. Also, I had some university obligations, I'll do my best to finish it by the end of this week.

isabelcosta commented 3 years ago

Nice @NenadPantelic :) thank you for updating! It's ok as long as you provide updates within 3 days, like in the contributing guidelines its fine 💪🏾

NenadPantelic commented 3 years ago

Comments search is still in beta, maybe we can skip that at this point, cause it is stated that those endpoints are prone to changes. Also PRs are categorized as open and closed, merge state doesn't exist. I implemented the following:

  1. num of issues (opened issues)
  2. num of made PRs -> open and closed (made PRs)
  3. num of PR reviews
  4. num of commits -> on only default branches We can maybe add number of created repositories. What do you think? @isabelcosta @codesankalp @Rahulm2310 @Aaishpra
codesankalp commented 3 years ago

@NenadPantelic

jalajcodes commented 3 years ago

PRs are categorized as open and closed, merge state doesn't exist.

@NenadPantelic Hi, you can get merged PR's using /search/issues endpoint

Here's an example search query q='is:merged is:pr author:username archived:false user:anitab-org'

NenadPantelic commented 3 years ago

@jalajcodes I found that soon after I wrote this comment, but I read that from my phone and wasn't able to update the comment, thanks anyway :slightly_smiling_face:. Interesting, they didn't categorized it in the state of the PR.

NenadPantelic commented 3 years ago

@NenadPantelic

  • Comments can be accessed using this https://docs.github.com/en/rest/reference/issues#comments and if in the future it will change then I think we can change the API endpoint later.
  • In the number of opened issues, I want to ask will it contain the count of the open and closed issues?
  • Can you include merged PRs count in num of made PRs as closed PRs are very less compared to merged PRs count?
  • I think the number of commits and merged PR will be the same for contributors in the default branch.
  • Also I want to ask what is the significance in adding the number of created repositories?

I can get issue comments, but I thought I should get comments in general, like discussions (discussion endpoint is in BETA)? I think the issue description should be more detailed. I'm not sure also what's the significance of the comments count. Generally, in my opinion, created issues and assigned, made PRs (both merged and non-merged) and commits show the level of someone's contribution. I don't think comments are so important.

codesankalp commented 3 years ago

I can get issue comments, but I thought I should get comments in general, like discussions (discussion endpoint is in BETA)? I think the issue description should be more detailed. I'm not sure also what's the significance of the comments count. Generally, in my opinion, created issues and assigned, made PRs (both merged and non-merged) and commits show the level of someone's contribution. I don't think comments are so important.

I think the comment count is needed for evaluating the interaction or engagement with the community. So that admin can sum up zulip messages and comments count to guess how much the contributor is involved in community @NenadPantelic. But I also don't know the accurate significance of the comments count as I am not present at the time when MVP was written.

NenadPantelic commented 3 years ago

One more question and I'm done with this. I wasn't present when this feature was defined, so that's the reason I'm discussing it so much, sorry :sweat_smile:. GitHub stats of the user should be only oriented to that's user activity in AnitaB's projects or generally?

Aaishpra commented 3 years ago

One more question and I'm done with this. I wasn't present when this feature was defined, so that's the reason I'm discussing it so much, sorry 😅. GitHub stats of the user should be only oriented to that's user activity in AnitaB's projects or generally?

For AnitaB's projects specifically, no need of other project activity.

NenadPantelic commented 3 years ago

Thanks @Aaishpra

NenadPantelic commented 3 years ago

@Aaishpra can you please update this privilege table?

------------------- Admin Regular user for his own stats
read stat YES/NO YES/NO
create/update YES/NO YES/NO

Obviously, next time I'll be present when some feature is defined :joy:

Aaishpra commented 3 years ago

Yes for all @NenadPantelic, I don't think there was any specific feature/privilege for the admin side or for the user side when it comes to Github Stats (the main privileges were given to admins for the forms section).

And don't worry about not being present for this feature discussion/definition, basically, this feature was suggested during the last GSoC but not implemented until now(and is a part of MVP so completing it is a priority), so most of us were also not present when this feature was suggested 😆 .

isabelcosta commented 3 years ago

fixed in #119