Closed Nagasaki45 closed 7 years ago
Although the above might work, there is no need to be unopinionated here. I tend to go with an opinionated version instead, which is, in the simplest case, deciding about weights for the stats.
For example, having the following coefficients:
Alternatively, @cool-RR suggested the following, dynamic, calculation: For each statistic (commits, contributors, merged PR, etc.) calculate the percentile of the repo. Then, add the percentiles of all statistics together to get the krihelimeter. This will ensure that the krihelimeter will be bounded between 0 and 100 * num_of_statistics.
If someone have better idea for calculating the krihelimeter please do tell.
See f3d2ec22adcfd1060944a69c2a62b28659b5c131. The above basic calculation was implemented.
Looks good :)
I'm satisfied with the current calculation. See no reason to change it soon. Therefore, I'm closing the ticket.
FWIW I think that adding points per contributor flat is creating a bit of imbalance for smaller projects.
Lets say I have:
I suggest adding points for commits, PRs, issues and then multiply them with a coefficient for the authors. So projects that have more contributors get a slightly higher rating than one with less contributors without being completely imbalanced.
First, thanks for the feedback!
I suggest adding points for commits, PRs, issues and then multiply them with a coefficient for the authors.
I'm not quite sure what do you mean. With this suggestion the difference between the two scenarios you provided will be even bigger, isn't it? Assuming all the weights for commits / issues / PRs remain the same
Maybe I'm completely wrong in understanding your suggestion. Can you please elaborate?
I'm not quite sure what do you mean. With this suggestion the difference between the two scenarios you provided will be even bigger, isn't it?
If you multiply by the author number, yes. But that is not what I meant. Let's just for examples sake say that you multiply by 1 + 0.1 * (authors-1) then you get:
(20 1 + 4 8) 1.2 = 62,4 (60 1) * 1 = 60
I also think that PR and issues should not weigh so much more than commits. They should also be rebalanced. It might take a bit more effort to actually come up with a formula that represents the activity properly across multiple projects. And I am not getting into the time component which might also be interesting (like what project is more active? One that gets a couple commits every day or one that gets a bunch on one day of the month and nothing happens for the rest of the time). As you see it can become quite complicated, question is are you aiming for that or a simple but inaccurate (imo) number.
I think that the best way to decide if the suggested metric is better is to generate a new "most active" list based on it and investigate the results. After all, it is all very subjective. I will do this for the entire DB, and maybe for the python language, as it is both very active and I'm relatively familiar with. Would you like to see the results for other languages?
Top 50 repos
Current metric | Suggested metric |
---|---|
CocoaPods/Specs | CocoaPods/Specs |
Microsoft/vscode | Microsoft/azure-docs |
kubernetes/kubernetes | NixOS/nixpkgs |
Microsoft/azure-docs | kubernetes/kubernetes |
NixOS/nixpkgs | githubschool/open-enrollment-classes-introduction-to-github |
aburasali/cs362w17online | Microsoft/vscode |
BlissRoms/platform_frameworks_base | ansible/ansible |
ansible/ansible | rust-lang/rust |
githubschool/open-enrollment-classes-introduction-to-github | dotnet/corefx |
rust-lang/rust | gentoo/gentoo |
dotnet/corefx | caskroom/homebrew-cask |
caskroom/homebrew-cask | Automattic/wp-calypso |
freebsd/freebsd-ports | tensorflow/tensorflow |
gentoo/gentoo | tgstation/tgstation |
tgstation/tgstation | aburasali/cs362w17online |
ampproject/amphtml | jlord/patchwork |
Automattic/wp-calypso | Homebrew/homebrew-core |
tensorflow/tensorflow | DefinitelyTyped/DefinitelyTyped |
jlord/patchwork | hashicorp/terraform |
flutter/flutter | facebook/react-native |
hashicorp/terraform | DroidKaigi/conference-app-2017 |
everypolitician/everypolitician-data | saltstack/salt |
DefinitelyTyped/DefinitelyTyped | dart-lang/sdk |
Homebrew/homebrew-core | freebsd/freebsd-ports |
DroidKaigi/conference-app-2017 | golang/go |
angular/angular-cli | ampproject/amphtml |
saltstack/salt | docker/docker |
docker/docker | JuliaLang/julia |
facebook/react-native | flutter/flutter |
dotnet/roslyn | dotnet/coreclr |
dart-lang/sdk | angular/angular-cli |
JuliaLang/julia | liferay/liferay-portal |
golang/go | apple/swift |
dotnet/coreclr | dotnet/roslyn |
krexus/frameworks_base | nodejs/node |
earl/llvm-mirror | elastic/elasticsearch |
llvm-mirror/llvm | d3athrow/vgstation13 |
liferay/liferay-portal | home-assistant/home-assistant |
apple/swift | mantidproject/mantid |
NixOS/nixpkgs-channels | servo/servo |
convox/rack | everypolitician/everypolitician-data |
elastic/elasticsearch | openstack/openstack |
openstack/openstack | docker/docker.github.io |
nodejs/node | cockroachdb/cockroach |
freebsd/freebsd | joomla/joomla-cms |
dimagi/commcare-hq | dimagi/commcare-hq |
cockroachdb/cockroach | librenms/librenms |
d3athrow/vgstation13 | ManageIQ/manageiq |
beagleboard/linux | code-dot-org/code-dot-org |
joomla/joomla-cms | llvm-mirror/llvm |
Top 50 python repos
Current metric | Suggested metric |
---|---|
ansible/ansible | ansible/ansible |
saltstack/salt | saltstack/salt |
dimagi/commcare-hq | home-assistant/home-assistant |
home-assistant/home-assistant | dimagi/commcare-hq |
odoo/odoo | odoo/odoo |
LLNL/spack | LLNL/spack |
mozilla/addons-server | mozilla/addons-server |
wikimedia/mediawiki-extensions | wikimedia/mediawiki-extensions |
edx/edx-platform | edx/edx-platform |
rg3/youtube-dl | rg3/youtube-dl |
fchollet/keras | cloudmesh/classes |
zulip/zulip | zulip/zulip |
cloudmesh/classes | ros/rosdistro |
ros/rosdistro | duckduckgo/zeroclickinfo-fathead |
duckduckgo/zeroclickinfo-fathead | fchollet/keras |
Azure/azure-cli | coala/coala |
AdguardTeam/AdguardFilters | Azure/azure-cli |
openshift/openshift-ansible | inasafe/inasafe |
coala/coala | statsmodels/statsmodels |
statsmodels/statsmodels | openshift/openshift-ansible |
google/ggrc-core | ipython/ipython |
Theano/Theano | uclouvain/osis |
inasafe/inasafe | buildbot/buildbot |
pandas-dev/pandas | frappe/erpnext |
matplotlib/matplotlib | pandas-dev/pandas |
conda/conda | matplotlib/matplotlib |
frappe/erpnext | Theano/Theano |
scikit-learn/scikit-learn | google/ggrc-core |
ipython/ipython | mirumee/saleor |
rcbops/rpc-openstack | scikit-learn/scikit-learn |
uclouvain/osis | rcbops/rpc-openstack |
mirumee/saleor | python/mypy |
buildbot/buildbot | bigchaindb/bigchaindb |
pisilinux/main | django/django |
openshift/openshift-tools | pymedusa/Medusa |
python/mypy | airbnb/superset |
openembedded/openembedded-core | ManageIQ/integration_tests |
bigchaindb/bigchaindb | terasolunaorg/guideline |
kubernetes-incubator/kargo | django-oscar/django-oscar |
ManageIQ/integration_tests | Cloud-CV/EvalAI |
django/django | kubernetes-incubator/kargo |
pfnet/chainer | openbmc/openbmc |
airbnb/superset | AdguardTeam/AdguardFilters |
Cloud-CV/EvalAI | pfnet/chainer |
blueboxgroup/ursula | openstates/openstates |
pymedusa/Medusa | astropy/astropy |
django-oscar/django-oscar | galaxyproject/galaxy |
xonsh/xonsh | edx/configuration |
getsentry/sentry | SatelliteQE/robottelo |
terasolunaorg/guideline | conda/conda |
I actually don't want to crunch some statistics but those lists without the numbers that lead to this outcome don't provide any information to me to see if it got better (imo) or not. :)
You are absolutely right! Here is a .csv file with all of the repos data currently in the DB. Waiting to see what you get ;-)
Take the DB, apply PCA to 1 dimension, use coefficients.