JuliaEcosystem / PackageAnalyzer.jl

https://juliaecosystem.github.io/PackageAnalyzer.jl/dev/
MIT License
59 stars 5 forks source link

Counting the number of CompatHelper installations #56

Open DilumAluthge opened 3 years ago

DilumAluthge commented 3 years ago

I'm interested in answering the following questions for the packages in the General registry.

For the purpose of this analysis, let us define package X has CompatHelper installed as on the default branch of the Git repo for package X, there is a file named .github/workflows/CompatHelper.yml. (I'm open to a better definition.)

  1. How many packages in the General registry have CompatHelper installed?
  2. How many unique individual user accounts (i.e. not organization accounts) have CompatHelper installed on one or more of their packages?
  3. How many unique organization accounts (i.e. not individual user accounts) have CompatHelper installed on one or more of their packages?

Would it be possible for me to use PackageAnalyzer.jl to help me answer these questions?

giordano commented 3 years ago

We ignore it: https://github.com/JuliaEcosystem/PackageAnalyzer.jl/blob/a7c2b1d80fce25326fb4d679195cfb9edaec4f68/src/PackageAnalyzer.jl#L507-L508 :stuck_out_tongue:

ericphanson commented 3 years ago

It would be pretty easy to modify PackageAnalyzer to answer (1). All the "analysis" takes places in https://github.com/JuliaEcosystem/PackageAnalyzer.jl/blob/a7c2b1d80fce25326fb4d679195cfb9edaec4f68/src/PackageAnalyzer.jl#L480-L541. At that point of the pipeline, we've cloned the package to a local directory and are populating a Package struct with information gained by inspecting the repo. So we could add a compat_helper field to the struct and check if the workflow exists in that function, either by direct name or actually looping through workflow file contents and looking for the string "CompatHelper" or such. (I'd be in favor of such an addition since I think it's useful to know!)

For (2) and (3), I think that might be doable by combining the results of (1) with queries to the github api to check if an account is an org or not. I think that step might be outside the purview of PackageAnalyzer itself though.

giordano commented 3 years ago

I think that in the future we may want to collect all filenames in .github/workflows, and exclude compathelper.yml, tagbot.yml only as part of data munging analysis, much like what we do now for the contributors (we were initially filtering out the bots, including @staticfloat), but honestly I'd like to avoid changing the data structure before JuliaCon.