chirunconf / chirunconf19

Discussion of potential projects for Chicago R Unconference, March 9-10, 2019
16 stars 2 forks source link

Analyze github repos to identify R packages needing help #23

Open kbroman opened 5 years ago

kbroman commented 5 years ago

maybe we could identify R packages on github and measure something like number of open issues vs time since last commit. might point to packages of interest that need help

jdblischak commented 5 years ago

@kbroman Good idea! For getting started, @jimhester's package itdepends has some code for gathering relevant info like the number of open issues, etc.

https://github.com/jimhester/itdepends/blob/128cd7e42d866c3beaf8ab40ab3cf2e42392208f/R/github.R#L10

maurolepore commented 5 years ago

👍 Are you thinking about popular, published packages (e.g. tidyverse packages)? If so, I propose something complementary: To provide a package-help group for chirunconf packages. I'll explain more on a separate issue, but the main idea is to help on the technical side so that folks unfamiliar with package-development tools can realize their ideas and share them as a package at the end of the unconf.

kbroman commented 5 years ago

@maurolepore I wasn't thinking tidyverse particularly, but rather of trying to crawl github to find repositories that were interesting to people (as indicated by there being issues) but not necessarily kept up.

maurolepore commented 5 years ago

Maybe related to https://github.com/chirunconf/chirunconf19/issues/32

chasemc commented 5 years ago

Great idea! I don't think you'll need to crawl, the GH API is pretty good, and: https://github.com/r-lib/gh exists

jdblischak commented 5 years ago

Great idea! I don't think you'll need to crawl, the GH API is pretty good, and: https://github.com/r-lib/gh exists

The Search API is more restrictive than the other parts of the API. I often get rejected "for triggering abuse mechanisms" even though I am only querying a few hundred results. Searching every R package on GitHub would take some time.

Another possibility would be to start with this curated list of GitHub R packages. Starting from this list, then we could use the GitHub API to query specific attributes about each repository.

jimhester commented 5 years ago

The GH API lets you search repositories by number of help-wanted-issues, which might be a way to go to find some places to help.

search <- gh::gh("/search/repositories", q = "language:r", sort = "help-wanted-issues", order = "desc")

library(purrr)
library(tibble)

map_chr(search[[3]], "full_name")
#>  [1] "UptakeOpenSource/uptasticsearch"              
#>  [2] "BioinformaticsFMRP/TCGAbiolinks"              
#>  [3] "cbeleites/hyperSpec"                          
#>  [4] "Huh/PopR_SDGFP"                               
#>  [5] "International-Soil-Radiocarbon-Database/ISRaD"
#>  [6] "TommyJones/textmineR"                         
#>  [7] "UptakeOpenSource/pkgnet"                      
#>  [8] "ropenscilabs/learngganimate"                  
#>  [9] "slowkow/ggrepel"                              
#> [10] "BonnyCI/ci-plunder"                           
#> [11] "OpenMx/OpenMx"                                
#> [12] "ProvTools/provR"                              
#> [13] "statnet/ergm"                                 
#> [14] "retrography/OrientR"                          
#> [15] "trestletech/plumber"                          
#> [16] "ices-tools-prod/fisheryO"                     
#> [17] "jackwasey/icd"                                
#> [18] "rich-iannone/DiagrammeR"                      
#> [19] "ropensci/tabulizer"                           
#> [20] "cloudyr/googleComputeEngineR"                 
#> [21] "TIBHannover/BacDiveR"                         
#> [22] "HenrikBengtsson/aroma.seq"                    
#> [23] "HenrikBengtsson/Wishlist-for-R"               
#> [24] "theclue/facebook.S4"                          
#> [25] "NKU-DSC/RTrainingMaterials"                   
#> [26] "kevinwolz/hisafer"                            
#> [27] "fabian-s/tidyfun"                             
#> [28] "vertica/DistributedR"                         
#> [29] "ropenscilabs/dataspice"                       
#> [30] "franzbischoff/tsmp"

Created on 2019-03-08 by the reprex package (v0.2.1)

Of course this depends on the repo owners using that specific tag for issues, which many do not.