ansible-community / stats-collections

RShiny app to display statistics for the Ansible Collections
GNU General Public License v3.0
1 stars 2 forks source link

Release / activity stats to identify collections which need to be released / supported / provided with new maintainers #25

Open Andersson007 opened 3 years ago

Andersson007 commented 3 years ago

It's a living issue. Goals / metrics can be changed at any time. Relates to https://github.com/ansible-community/stats-collections/issues/23

Goal

To track releases / activity / find new maintainers in / for the collections, revoke privileges from inactive maintainers.

Issue

at the moment, we have 80+ collections under ansible-collections. Before the splitting, rarely but surely, users got merged fixes / new features shipped. Now, we should monitor the collections and prevent the situations when it has merged stuff but not released or it has stuff to merge but no active committers maintainers.

Several possible scenarios
  1. A collection gets things merged regularly and releases them regularly - so, everything is OK, the collection is fully maintained ("fully" means that there are active committers and they release the collection). What we should do: keep tracking the activity there

  2. A collection gets things merged regularly but there have been no releases for a long time. What we should do: a) check if there's a release policy in the collection b) ask the committers why they don't release the collection - no time or they need to be trained c) release the collection ourselves / train the committers how to conduct releases

  3. A collection doesn't get things merged, no releases, but there are new PRs submitted since the latest release (at least, it's 1.0.0). What we should do: a) conduct a PR day b) release ourselves c) find maintainers from the community (first of all, from active contributors) d) revoke privileges from inactive maintainers

  4. A collection doesn't get things merged, no releases, no PRs submitted. What we should do: a) if there are maintainers, ask them if the content of the collection is still relevant for users b) we could also use a number of monthly open issues as a measure ^ c) if relevant, maybe the collection is new and nobody knows about it. Should we announce it using possible ways? d) if irrelevant (e.g. the underlying service is dead), should we do nothing?

What would be helpful to see

Dashboard(s) with:

What would help see problem spots
What METRICS to collect / calculate

(with fields described above and all per month)

GregSutcliffe commented 3 years ago

Thanks @Andersson007, this is good stuff. I especially like seeing the thinking around how you'll use the data, and what scenarios it might enable you to detect - this helps me work out some visualization.

In terms of the data, we already have most of this. Issues & PRs are indexed daily by ansible-community/stats-crawler, and we can extend that where needed. We don't have tags, but they are a single GH API query per repo, which is light (a single authenticated GH key can make 5000 requests per hour). So this seems easily achievable.

Much of the graphs you want are already visible at https://stats.eng.ansible.com/app/collections_dash although I accept the presentation could be improved. In particular, releases is currently derived from Galaxy, and shown on a separate tab. Merging this into a single graph is probably helpful (such as marking the releases as vertical lines on a plot of issues/PRs). I'd love to hear what you'd like improved there - I'll also work on a static version of this kind of thing for our teams weekly reports.

Regarding alerts, my feeling is that the threshold might vary widely between collections - some will be super stable, some less so. We'll need to think carefully about this one; in the meantime, perhaps a simple table of time-since-last-release and time-since-last-commit (and perhaps, the difference of the two) would allow us to at least look at an overview of the situation?

Final note, when you say commits are available via the GH API, I assume you mean https://docs.github.com/en/rest/reference/repos#list-commits. I'll have a play with that, we can likely make some light GraphQL calls that just return the last commit for every collection at once...

/cc @gundalow

Andersson007 commented 3 years ago

@GregSutcliffe the written above sounds very good!

Regarding alerts, my feeling is that the threshold might vary widely between collections - some will be super stable, some less so.

We could define, say, 3 month time delta as a default. Then adjust where needed (we should have individual settings per collection for that, tanks for the idea:) ). Anyway, the notification / alerts are not necessary but would be good to have. I could try to implement it myself later.