cncf / devstats

📈CNCF-created tool for analyzing and graphing developer contributions
https://devstats.cncf.io
Apache License 2.0
61 stars 22 forks source link

[feature request] #55

Closed UriZafrir closed 2 months ago

UriZafrir commented 3 months ago

Hi, I’m trying to use devstats - all.devstats.cncf.io to see for each company which commits were made to which cncf projects. this can help me see what is the open source stack for each company.. is there a dashboard for this? if not, which parameters should I choose for this? I think it could be interesting for the community. image

lukaszgryglicki commented 3 months ago

If there is anything else needed, please describe what exactly is needed and I can generate a custom report for you, or we can consider adding a new dashboard if needed.

Generally look for dashboards related to companies (company in their name or companies tag) and by repository groups (repository group is a single CNCF project in case of All CNCF DevStats instance).

UriZafrir commented 3 months ago

Hi, thank you for your quick answer. Indeed, using "company-contribution-counts-in-repository-groups", I am able to see per company what are the number of it's commits per project. I can choose one company at a time. 

I think it might be interesting for the community (or just for me) to be able to compare companies' commits per repo - maybe a dashboard of a bar chart sorted by company (one bar per company) that shows the number of commits a company has per project. This could be interesting because then we can understand the "CNCF" projects stack it's likely using.

Also I think the "rank" column in the default query is not really necessary if they are sorted by number of commits... I switched the "rank" column it with the "value" column. So another suggestion would be to switch them in the default query.

lukaszgryglicki commented 3 months ago

I will check this tomorrow, but even now - LMK if you want me to add any new dashboard or not? And if so, then I would need specs for that dashboard - cc @caniszczyk

lukaszgryglicki commented 3 months ago

I think it might be interesting for the community (or just for me) to be able to compare companies' commits per repo

maybe a dashboard of a bar chart sorted by company (one bar per company) that shows the number of commits a company has per project. This could be interesting because then we can understand the "CNCF" projects stack it's likely using:

Also I think the "rank" column in the default query is not really necessary if they are sorted by number of commits... I switched the "rank" column it with the "value" column. So another suggestion would be to switch them in the default query.

UriZafrir commented 3 months ago

As I understand from your response, there is something similar to what I'm asking but not what I'm asking. Regarding you earlier response, yes, creating a new dashboard would be interesting for me. Which specs do you need?

lukaszgryglicki commented 3 months ago

Specs:

For new dashboard I would need a green light from @caniszczyk to start working on it as it will probably take a day or two to implement.

UriZafrir commented 3 months ago

I guess to decide if this is a dashboard or a custom query we first need to know if this feature is interesting to anyone else but me. Is there a place to suggest it to anyone else to see the response of the community?

Specs as I understand: A tabular dashboard (not time series). Repo drop down (with All option), company drop-down (with All option), time range dropdown - with last week, last month, last year, last decade, etc. I guess its also all options. As I wrote, a bar chard, with each bar representing a company, with all its commits by project (with colors). For example the google bar will contain commits for kubernetes, helm, etc stacked by colors. Hope its clear. So when viewing the entire dashboard one will know for each company what are it's commits by project. I think its only relevant to all cncf instance.

lukaszgryglicki commented 3 months ago

Specs are clear, now this is only the question of - do we need a separate dashboard for this, or is this OK that I will generate a custom report for this.

cc @caniszczyk - you can also ask on slack channel #devstats for example...

UriZafrir commented 3 months ago

I have no problem putting it on the slack channel #devstats but it seems it's archived

lukaszgryglicki commented 3 months ago

Hmm I don't see that it is archived:

Zrzut ekranu 2024-04-5 o 15 28 56 Zrzut ekranu 2024-04-5 o 15 29 19

I can provide a custom report for you next week, and if there is a decision that we also need a dashboard for this, I can then create one based on already generated report (its underlying query, etc.) - makes sense?

UriZafrir commented 3 months ago

correct. I joined the channel and posted the idea. OK about the custom report. Thanks!

lukaszgryglicki commented 3 months ago

Will do on Monday.

lukaszgryglicki commented 3 months ago

On it - will generate a custom report and provide details here.

lukaszgryglicki commented 3 months ago

I've added a report query that calculates the requested data. I've described here (section Commits stats for projects and companies) how to use it in the reporting pod. I've generated a CSV file with results for limiting number of companies and projects to 50, so you can see top 50 companies in each of top 50 projects and top 50 projects in each of top 50 companies. I've generated data for all time (1/1/2014 - 1/1/2025). Here is the CSV: company_project_commits_stats.csv

I will also upload this to Google sheet and create some example chart and put a link here.

lukaszgryglicki commented 3 months ago

Here is the google sheet with data, you can create any chart you need from it. It has two sheets showing the same data, but organised differently. Your request was for data in By Company sheet, it contains.

I'm skipping any bot activities and I'm not including Independent affiliation which is not a real company, I'm also skipping CNCF projects which is just a statistics for CNCF itself.

Another sheet is By Project and it is showing the same data but from project's perspective not company, so the fist column there is a project and then split by companies (the opposite of By Company where the 1st column is a company and then split by project).

I'm not sure what exact chart is needed, but all data is in the linka nd I've allowed full edit for anybody with the link, so feel free to create any chart you need or just duplicate and do whatever is needed with this data.

For creating a dedicated dashboard that would be doing something similar I need a green light from @caniszczyk and more people's requests to do so - this si because we don't have a dashboard of this type yet (stacked bar charts not using timestamps but one column for separate "stacks" and then another columns for "stacking items") - so it will take a considerable amount of time to implement - plus I think that aggregated data will very seldom change, so generating this on request even twice a year will be a simpler solution that adding one more complex dashboard.

lukaszgryglicki commented 3 months ago

I'm closing this for now - I've added label needs-decision for this.

UriZafrir commented 3 months ago

thank you. I managed to generate a really simple graph with python and plotly. I think this is really interesting.... Would love to get that implemented. I can also try to help.

import pandas as pd
import plotly.express as px
df = pd.read_csv('company_project_commits_stats.csv')
fig = px.bar(df, x="company", y="all_company_commits", log_y=True,color="project", text_auto=True)
fig.update_xaxes(categoryorder='total descending')
fig.show()

newplot (1) newplot

lukaszgryglicki commented 3 months ago

Looking very good, thanks for this!

lukaszgryglicki commented 2 months ago

OK, closing this, please reopen if there is anything else needed.