cncf / toc

⚖️ The CNCF Technical Oversight Committee (TOC) is the technical governing body of the CNCF Foundation.
https://cncf.io
1.67k stars 629 forks source link

annual reviews: identifying further areas of automation for annual sandbox reviews #1134

Open krishnakv opened 1 year ago

krishnakv commented 1 year ago

Ref: https://github.com/cncf/toc/issues/1123

Identifying further areas of automation which would make future annual reviews easier. This could be in the form of additional metrics collection that can be automatically collected and present as a part of dashboards. An initial list of data items are below, we will keep updating this issue as a part of the annual review process and finally handover to CNCF for the actual automation for next cycle.

Data already collected by Devstats

Data points not collected

Cannot be automated??

feelinspired commented 11 months ago

Hello @krishnakv , @nikhita , thanks for the wonderful writeup! Its super helpful.

I am Chaitan and I am one of the Product Managers at The Linux Foundation, working with @krook to make the process of annual reviews easier. I had some questions to some of the points you have listed above. I appreciate your take on them please: (Please pardon me, these might be basic or repeat questions addressed somewhere. I am keen to learn very well and don't want to make any assumptions of having understood things. In typical Product Manager fashion trying to get to the root of the problem we want to solve, I am going to ask you a lot of why, what and how questions). On my part, I am reading as much possible and may be able to find the answers to the questions below eventually, but I reckon the fastest way to learn is to talk to my product users directly :)

Once again, please forgive me for the long list of questions. I sincerely want to learn as much possible inorder to provide as much context when we (Product team) meets to problem solve.

  1. "Are there any new commits to the project in the last year?" -
    i) By last year, you meant the calendar year (example: 2022) is that right? ii) Whats a good number of new commits that is considered to be healthy? iii) In your reviews for Sandbox projects, what was the min/max range of new commits? iv) How does that differ for Incubating and Graduated projects respectively? Whats the min/max range there respectively?

  2. "Are there an increasing number of contributors to the project?" i) I presume this is for the last calendar year (example: 2022) is that right? ii) A contributor is anybody who performs Commits (Author, Co-Author, Committer), Pull Request Activities (PR Open, PR Merged, PR comments, PR Reviewed, PR closed), Issue Activities (Issues Open, Comments, Closed). Am I missing some other activity/some other role here? ii) Increasing contributors = new contributors is that correct? ii) What percentage in increase of contributors is considered healthy? iii) In your reviews for Sandbox projects, what was the min/max range of percentage increase? iv) How does that differ for Incubating and Graduated projects respectively? Whats the min/max range there respectively?

  3. " Is there frequent pruning of issues and open issues are being addressed?" i) Would the first comment on a newly opened issue be considered as "addressed" even if for some reason it cannot be prioritised and fixed? ii) Whats a good open/close ratio of issues? Appreciate if you could provide a range based on your experience of evaluating Sandbox projects.

  4. Adoption i) What do we mean by Adoption? As different folks might have different interpretations of the same term, asking this question so that we don't make any assumptions. ii) Github stars apart, how have these sandbox projects tracked adoption? What are the sources?

  5. Project has an easily discoverable and open community forum i) Why is being easily discoverable and open community forum important? ii) Whats a good metric to track inorder to say we (the Product team) have hit this goal of providing the team that does the reviews with data that indicates "the project is easily discoverable/has an open community forum?"

  6. Goals i) I am curious to learn more about projects where 1 or each of the three goals listed above have changed and how they track it, present it to the reviewing members, so I'd appreciate if you could point me to an example or two.

Thank you for reading through my long list of questions.

krishnakv commented 11 months ago

Hi Chaitan, thanks for picking this up, have tried to address the list as thoroughly as possible. Happy to address if any more questions:

"Are there any new commits to the project in the last year?" - i) By last year, you meant the calendar year (example: 2022) is that right?

Krishna> the annual review process does not mention it, but yes, I think that's a good approach. https://github.com/cncf/toc/blob/main/process/sandbox-annual-review.md

ii) Whats a good number of new commits that is considered to be healthy? iii) In your reviews for Sandbox projects, what was the min/max range of new commits? iv) How does that differ for Incubating and Graduated projects respectively? Whats the min/max range there respectively?

Krishna> to address all the questions around min/ max numbers, I would consider at least one commit a month a "sign of life" but we may need to look further into the data. The bottom 5-percentile of projects in each category may be a better measure? Are the sqlite databases in the devstats page the best place to get some data for analysis?

"Are there an increasing number of contributors to the project?" i) I presume this is for the last calendar year (example: 2022) is that right?

Krishna> yes, the annual process doco does not confirm but that would be a good measure

ii) A contributor is anybody who performs Commits (Author, Co-Author, Committer), Pull Request Activities (PR Open, PR Merged, PR comments, PR Reviewed, PR closed), Issue Activities (Issues Open, Comments, Closed). Am I missing some other activity/some other role here?

Krishna> This covers it. Issues, PR's and Commits are certainly the list of objects you are work with to contribute to a repo.

ii) Increasing contributors = new contributors is that correct? ii) What percentage in increase of contributors is considered healthy? iii) In your reviews for Sandbox projects, what was the min/max range of percentage increase? iv) How does that differ for Incubating and Graduated projects respectively? Whats the min/max range there respectively?

Krishna> yes, increasing contributors would be new contributors. One per quarter would be a healthy measure, having said that, again data analysis to get some idea about the distribution of new contributors based on the category of the project may give us a better view.

" Is there frequent pruning of issues and open issues are being addressed?" i) Would the first comment on a newly opened issue be considered as "addressed" even if for some reason it cannot be prioritised and fixed? ii) Whats a good open/close ratio of issues? Appreciate if you could provide a range based on your experience of evaluating Sandbox projects.

Krishna> I would certainly look at the number of issues without a response for over 30 days and number of issues that were closed as stale. From experience, projects do have around 10% of these, but any more, for example, close to 70% or 80% of issues without a response for 30+ days is a clear sign of issues.

Adoption i) What do we mean by Adoption? As different folks might have different interpretations of the same term, asking this question so that we don't make any assumptions. ii) Github stars apart, how have these sandbox projects tracked adoption? What are the sources?

Krishna> projects are encouraged to have an ADOPTERS.md page to list adopters. The problem is that these are not in any standard format. Maybe if your team could have a look at some of these pages, to see if any information can be gleaned from them? In this case, most sandbox projects dont have this page. But having it and showing any growth is certainly a very positive sign. Some examples: https://github.com/chaos-mesh/chaos-mesh/blob/master/ADOPTERS.md https://github.com/kubearmor/KubeArmor/blob/main/ADOPTERS.md https://github.com/AthenZ/athenz/blob/master/ADOPTERS.md

Project has an easily discoverable and open community forum i) Why is being easily discoverable and open community forum important? ii) Whats a good metric to track inorder to say we (the Product team) have hit this goal of providing the team that does the reviews with data that indicates "the project is easily discoverable/has an open community forum?"

Krishna> Again, this might involve some parsing of the README page, but having an open forum and open meetings are certainly a very strong indication that the project is adhering to the spirit of CNCF. Its all about not ensuring that the project is not dominated by one vendor but welcomes new contributors from all communities.

Goals i) I am curious to learn more about projects where 1 or each of the three goals listed above have changed and how they track it, present it to the reviewing members, so I'd appreciate if you could point me to an example or two.

Krishna> the annual review board is present at https://github.com/orgs/cncf/projects/27/views/4 , having said that, many of these projects are in their first year. Some interesting examples of long running projects (from memory :-) you can look at are Confidential containers, Chaos mesh and Keylime.

Hope this helps and if you do have a workbench you can point me to do analysis on devstats data, would be happy to jump in. :-)

lukaszgryglicki commented 11 months ago

Just one note:

feelinspired commented 11 months ago

Many thanks @krishnakv ! Really appreciate your detailed response

jberkus commented 11 months ago

I'm going to propose that:

... be replaced with:

Stars are pretty meaningless, and are generally exclusively the result of project leads having a campaign to collect stars, rather than an indication of any kind of real user activity. Asking for star counts just wastes everyone's time.

dmueller2001 commented 11 months ago

I'd like to propose that we add a Maintainer Health Check to ensure that there are actively engaged maintainers on the sandbox projects. We could easily automated checking that maintainers listed in maintainers.md have had some activity (commit, pr, merge, comment, issue) within the past year on that project.

craigbox commented 11 months ago

I recommend active checking of vendor neutrality; i.e. "Is the project demonstrably separate from its contributing vendor?" and "Has the contributing vendor demonstrably separated itself from the project?"

(These things came up the levels working group, and I was referred back here, so I'm sorry for derailing; these are in no way automatable)

TheFoxAtWork commented 5 months ago

Apologies for the delayed response, tagging @makkes who commented on #1181 calling out the lack of update from us.

The TOC reached out LFX Insights who is working on setting up a project health review board that reflects the content identified here on this issue as well as incorporates feedback from the TOC.

Current estimates for kicking off development is April 2024.

TheFoxAtWork commented 4 months ago

Requesting an update here

prithvi1307 commented 4 months ago
  1. Contributions to the project (Last 6 months)
  2. Active contributors to the project (Last 6 months)
  3. Each project should be integrated with the free version of Scarf (supported by Linux Foundation) for usage metrics

This can be updated to the TAG associated with the project on a monthly basis or CNCF can create a monthly survey link on Asana/SurveyMonkey which should be shared with maintainers of the project on the mailing list and then the maintainers are responsible for sharing these stats.

The above can be verified on a bi-monthly basis by a respective CNCF Staff.
Perhaps, this is also calling for a dedicated CNCF staff only for helping evaluate project metrics.

krook commented 2 months ago

Hello all, sorry for the delayed reply.

A couple of updates: