bcgsc / orca

:whale: Genomics Research Container Architecture
http://www.bcgsc.ca/services/orca
GNU General Public License v3.0
48 stars 13 forks source link

Table of GitHub metadata and analytics #26

Open sjackman opened 7 years ago

sjackman commented 7 years ago

Hi, Tanya. Can you please prepare a table that lists…

tmozgach commented 6 years ago

List all formulae in homebrew-science tap:

TAP=homebrew/homebrew-science
TAP_PREFIX=$(brew --prefix)/Library/Taps
ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb

URL from the 'head' field in the formula:

brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].head}.compact.map {|v| v.url}'

Name of formulae with 'head' field:

brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].head}.compact.map {|v| v.name}'

Name all formulae in the Tap:

brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].name}'

URL from the 'homepage' field:

brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].homepage}'

Find formulae that have bioinformatics tag:

grep -i '# tag "bioinformatics"' $TAP_PREFIX/$TAP/*.rb | xargs -I{} basename {} .rb | grep -o '^[^.]*'

GET the number of GitHub watchers, stars, and forks:

  "subscribers_count": 12   dJSON$subscribers_count
  "watchers": 1,  dJSON$watchers
  "forks": 1, dJSON$forks
sjackman commented 6 years ago

Here's the record of installations for macOS: https://raw.githubusercontent.com/Homebrew/homebrew.github.io/master/_data/install.json It includes only the top-1000 most popular packages, including those in Hombrew/core, so it includes only 25 packages in Homebrew/science, only two of which are bioinformatics packages, htslib and samtools.

sjackman commented 6 years ago

This particular file is not very useful for Homebrew/science. In any case, it can be converted to TSV like so:

brew install jq miller
curl -L https://raw.githubusercontent.com/Homebrew/homebrew.github.io/master/_data/install.json \
    | jq .items | mlr --ijson --otsvlite cat
tmozgach commented 6 years ago

homebrew_science_stat.txt

Change the extension txt to tsv

sjackman commented 6 years ago

Thanks, Tanya!

sjackman commented 6 years ago

@tmozgach Do you have source code for the script that created this file homebrew_science_stat.txt?

tmozgach commented 6 years ago

@sjackman I will post it here, because it is not related ORCA project.

# Date: 22/09/2017
# Author: Tatyana Mozgacheva tmozgacheva@bcgsc.ca
# Description: This script generates a table of:
# the number of GitHub watchers, stars, and forks for each formula in Homebrew/science that has a GitHub a repo;
# whether the formula is notable (forks ≥ 20 or watchers ≥ 20 or stars ≥ 50);
# the number of macOS installations in the last year;
# the number of Linuxbrew installations in the last year;
# whether the formula has a # tag "bioinformatics".
# In order to get the information about statistic, the git page is required.
# This page is located in either 'homepage' field or 'head' field in formulae's code.
# Procedure:
# 1) In the terminal, where brew is installed, type:
#    TAP=homebrew/homebrew-science
#    TAP_PREFIX=$(brew --prefix)/Library/Taps
#    *URL from the 'head' field: 
#    brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].head}.compact.map {|v| v.url}'
#    and copy the output into 'urls' variable
#    *Name of formulae that has 'head' field: 
#    brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].head}.compact.map {|v| v.name}'
#    and copy the output into 'formulae' variable
#    *Name of all formulae in TAP: 
#    brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].name}'
#    and copy the output into 'formulae2' variable
#    *URL from the 'homepage' field: 
#    brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].homepage}'
#    and copy the output into 'fhomepages' variable
# 3) Get rid of 'https://github.com', '.git' in order to get name of the repo.
# 4) Type in the terminal:
#    grep -i -l '# tag "bioinformatics"' $TAP_PREFIX/$TAP/*.rb | xargs -I{} basename {} .rb | grep -o '^[^.]*'
#    Copy the output into file: bio_tag.tcv
#    Import Dataset
# 5) Type in the terminal:
#    brew install jq miller
#    cat science-macos-20170928.json | jq .items | mlr --ijson --otsvlite cat
#    Copy the output into file:science-mac.csv
#    Import Dataset
# 4) Run the following script:
install.packages("devtools")
install.packages("tidyjson")
install.packages("tidyverse")
install.packages("knitr")
install.packages("DT")
install.packages("qcc")

devtools::install_github("hrbrmstr/curlconverter")
library(qcc)
library(devtools)
library(curlconverter)
library(jsonlite)
library(httr)

library(tidyverse)
library(knitr)
library(DT)

library(tidyjson)   

#Input 
urls <- c("/bcgsc/abyss", "/Sheikhizadeh/ACE", "/bigdatagenomics/adam", "https://projects.coin-or.org/svn/ADOL-C/trunk/", "/EvolBioInf/afra", "/alembic/alembic", "/ALPSCore/ALPSCore", "/merenlab/anvio", "/b-k/apophenia", "/fredrik-johansson/arb", "/bcgsc/arcs", "/dzerbino/ascii_plots", "/dstndstn/astrometry.net", "/herumi/ate-pairing", "/juliema/aTRAM", "/bredelings/BAli-Phy", "/DecodeGenetics/BamHash", "/macroevolution/bamm", "/genome/bam-readcount", "/pezmaster31/bamtools", "/statgen/bamUtil", "/tseemann/barrnap", "/GATB/bcalm", "/beagle-dev/beagle-lib", "/CompEvol/beast2", "/beast-dev/beast-mcmc", "/bedops/bedops", "/arq5x/bedtools2", "/BEETL/BEETL", "/lh3/bfc", "/lh3/bioawk", "/bcgsc/biobloom", "/evoldoers/biomake", "/maasha/biopieces", "/BitSeq/BitSeq", "/PacificBiosciences/blasr", "/flame/blis", "/BenLangmead/bowtie2", "/BenLangmead/bowtie", "http://svn.gna.org/svn/service-tech/trunk/bpel2owfn", "/ssadedin/bpipe", "/barricklab/breseq", "/lh3/bwa", "/cantera/cantera", "/marbl/canu", "/weizhongli/cdhit", "/infphilo/centrifuge", "/sanger-pathogens/circlator", "/tschaume/ckon", "/xavierdidelot/ClonalFrameML", "/ivazquez/cloneHD", "https://projects.coin-or.org/svn/CoinMP/trunk", "/CSCsw/ColPack", "https://svn.code.sf.net/p/cp2k/code/trunk", "/cusplibrary/cusplibrary", "/marcelm/cutadapt", "/thegenemyers/DALIGNER", "/jeroenjanssens/data-science-toolbox", "/thegenemyers/DAZZ_DB", "/dealii/dealii", "/tobiasrausch/delly", "/thegenemyers/DEXTRACTOR", "/DGtal-team/DGtal", "/tenomoto/dotwrp", "/nh13/DWGSIM", "/DynareTeam/dynare", "ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/versions/current/edirect.tar.gz", "/elemental/Elemental", "/Ensembl/ensembl-tools", "/aberer/exabayes", "/adarob/eXpress", "/agordon/fastx_toolkit", "/lh3/fermikit", "/lh3/fermi-lite", "/lh3/fermi", "/imageworks/Field3D", "/seqan/flexbar", "/wbhart/flint2", "/ekg/freebayes", "/BoevaLab/FREEC", "/molpopgen/fwdpp", "/GalSim-developers/GalSim", "/broadgsa/gatk-protected", "http://svn.gna.org/svn/service-tech/trunk/genet", "/genometools/genometools", "https://geuz.org/svn/getdp/trunk", "/marbl/gingr", "/arq5x/grabix", "https://git.skewed.de/count0/graph-tool", "https://bitbucket.org/gtborg/gtsam", "/sanger-pathogens/gubbins", "/rieck/harry", "/marbl/harvest-tools", "https://svn.janelia.org/eddylab/eddys/src/hmmer/trunk", "/rthurman/hotspot", "/lh3/htsbox", "/samtools/htslib", "/veg/hyphy", "/broadinstitute/IGV", "/igvteam/igv", "git://itk.org/ITK", "http://www.neuron.yale.edu/hg/neuron/iv", "https://projects.coin-or.org/svn/Ipopt/trunk", "/ITensor/ITensor", "/sanger-pathogens/iva", "/gmarcais/Jellyfish", "https://jmol.svn.sourceforge.net/svnroot/jmol/trunk/Jmol", "git://git.code.sf.net/p/jsbsim/code", "/attractivechaos/k8", "/bioinformatics-centre/kaiju", "/TGAC/KAT", "git://genome-source.cse.ucsc.edu/kent", "/marekkokot/KMC", "/pmelsted/KmerStream", "/bcgsc/kollector", "/DerrickWood/kraken", "http://git.icms.temple.edu/lammps-ro", "http://last.cbrc.jp/last", "/lastz/lastz", "/libbi/LibBi", "/danfis/libccd", "/y-256/libdivsufsort", "https://bitbucket.org/luukko/libeemd", "https://bitbucket.org/luciad/libgpkg", "https://git.assembla.com/phylogenetic-likelihood-library", "/molpopgen/libsequence", "git://sigrok.org/libsigrokdecode", "/mourisl/Lighter", "/AlgoLab/LightStringGraph")
urls <- c(urls, "/eddelbuettel/littler", "/mgymrek/lobstr-code", "http://svn.gna.org/svn/service-tech/trunk/lola2", "/apache/madlib", "http://svn.cern.ch/guest/madx/trunk/madX/", "/mimno/Mallet", "https://bitbucket.org/mantaflow/manta", "/marbl/Mash", "/mfillpot/mathomatic", "/matplotlib/matplotlib", "/voutcn/megahit", "/smithlabcode/methpipe", "/marbl/MHAP", "/ctSkennerton/minced", "/lh3/miniasm", "/lh3/minimap", "/hangelwen/miR-PREFeR", "https://bitbucket.org/fathomteam/moab", "/BhallaLab/moose-core", "/mothur/mothur", "https://mrbayes.svn.sourceforge.net/svnroot/mrbayes/trunk/", "https://simunova.zih.tu-dresden.de/svn/mtl4/trunk", "/jlblancoc/nanoflann", "/jts/nanopolish", "http://anonsvn.ncbi.nlm.nih.gov/repos/v1/trunk/c++", "/nest/nest-simulator", "http://www.neuron.yale.edu/hg/neuron/nrn", "/lindenb/newicktools", "/tjunier/newick_utils", "/nextflow-io/nextflow", "/G-Node/nix", "/lmrodriguezr/nonpareil", "/bcgsc/ntCard", "/sequencing/NxTrim", "/karel-brinda/ococo", "/ome/ome-common-cpp", "/ome/ome-files-cpp", "/openmicroscopy/bioformats", "/openalpr/openalpr", "/biometrics/openbr", "/OpenImageIO/oiio", "/openmeeg/openmeeg", "/occipital/OpenNI2", "/OpenNI/OpenNI", "/orocos/orocos_kinematics_dynamics", "/davidemms/OrthoFinder", "/gwaldron/osgearth", "/cburstedde/p4est", "/neufeld/pandaseq", "git://paraview.org/ParaView", "/marbl/parsnp", "git://scm.gforge.inria.fr/ricar/ricar", "https://bitbucket.org/petsc/petsc", "/jonchang/phlawd", "/FePhyFoFum/phyx", "/broadinstitute/pilon", "/chrchang/plink-ng", "/postgres-plr/plr", "http://svn.gna.org/svn/service-tech/trunk/pnapi", "/arq5x/poretools", "/ariloytynoja/prank-msa", "/hyattpd/Prodigal", "/tseemann/prokka", "/lh3/psmc", "git://sigrok.org/pulseview", "/esa/pykep", "https://svn.code.sf.net/p/pymol/code/trunk/pymol", "https://wwwsecu.irit.fr/svn/qr_mumps/tags/1.2", "/ihh/quaff", "/khowe/quicktree", "/isovic/racon", "/TGAC/RAMPART", "/stamatak/standard-RAxML", "/sebhtml/ray", "/mourisl/Rcorrector", "https://openmodelica.org/svn/MetaModelica/trunk", "/alexdobin/STAR", "/lh3/ropebwt2", "/lh3/ropebwt", "/rstudio/rstudio", "/rieck/sally", "/COMBINE-lab/salmon", "/lomereiter/sambamba", "/GregoryFaust/samblaster", "http://svn.gna.org/svn/service-tech/trunk/sara", "/rakhimov/scram", "/simongog/sdsl-lite", "/seqan/seqan", "/lh3/seqtk", "/jts/sga", "/najoshi/sickle", "git://sigrok.org/sigrok-cli", "/SimpleITK/SimpleITK", "https://free-astro.org/svn/siril/", "/SINTEF-Geometry/SISL", "/relipmoc/skewer", "/amplab/snap", "/sanger-pathogens/snp-sites", "/aquaskyline/SOAPdenovo2", "https://scm.gforge.inria.fr/anonscm/git/sollya/sollya", "/biocore/sortmerna", "/ncbi/sra-tools", "/statismo/statismo", "/gpertea/stringtie", "/torognes/swarm", "/sekika/swrcfit", "/symengine/symengine", "/lh3/tabtk", "/tamarin-prover/tamarin-prover", "/cbcrg/tcoffee", "/rmjarvis/tmv", "/bcgsc/transabyss", "/TransDecoder/TransDecoder", "/Blahah/transrate-tools", "https://software.sandia.gov/trilinos/repositories/publicTrilinos", "/scapella/trimal", "/trinityrnaseq/trinityrnaseq", "git://genome-source.cse.ucsc.edu/kent", "/rrwick/Unicycler/releases", "/sjackman/uniqtag", "/gobics/uproc", "/Victorian-Bioinformatics-Consortium/vague", "/brentp/vcfanno", "/ekg/vcflib", "/vcftools/vcftools", "/Victorian-Bioinformatics-Consortium/VelvetOptimiser", "/dzerbino/velvet", "/ukoethe/vigra", "/TinoDidriksen/cg3", "/torognes/vsearch", "/atks/vt", "/bemoody/wfdb", "/Ensembl/WiggleTools", "/LanguageMachines/wopr", "/herumi/xbyak", "/jimbraun/XCDF", "/gmarcais/yaggo")
formulae <- c("abyss", "ace-corrector", "adam", "adol-c", "afra", "alembic", "alpscore", "anvio", "apophenia", "arb", "arcs", "ascii_plots", "astrometry-net", "ate-pairing", "atram", "bali-phy", "bamhash", "bamm", "bam-readcount", "bamtools", "bamutil", "barrnap", "bcalm", "beagle", "beast2", "beast", "bedops", "bedtools", "beetl", "bfc", "bioawk", "biobloomtools", "biomake", "biopieces", "bitseq", "blasr", "blis", "bowtie2", "bowtie", "bpel2owfn", "bpipe", "breseq", "bwa", "cantera", "canu", "cd-hit", "centrifuge", "circlator", "ckon", "clonalframeml", "clonehd", "coinmp", "colpack", "cp2k", "cusp", "cutadapt", "daligner", "data-science-toolbox", "dazz_db", "dealii", "delly", "dextractor", "dgtal", "dotwrp", "dwgsim", "dynare", "edirect", "elemental", "ensembl-tools", "exabayes", "express", "fastx_toolkit", "fermikit", "fermi-lite", "fermi", "field3d", "flexbar", "flint", "freebayes", "freec", "fwdpp", "galsim", "gatk", "genet", "genometools", "getdp", "gingr", "grabix", "graph-tool", "gtsam", "gubbins", "harry", "harvest-tools", "hmmer", "hotspot", "htsbox", "htslib", "hyphy", "igv", "igvtools", "insighttoolkit", "inter-views", "ipopt", "itensor", "iva", "jellyfish", "jmol", "jsbsim", "k8", "kaiju", "kat", "kent-tools", "kmc", "kmerstream", "kollector", "kraken", "lammps", "last", "lastz", "libbi", "libccd", "libdivsufsort", "libeemd", "libgpkg", "libpll", "libsequence", "libsigrokdecode", "lighter", "lightstringgraph", "littler", "lobstr", "lola", "madlib", "mad-x", "mallet", "mantaflow", "mash", "mathomatic", "matplotlib", "megahit", "methpipe", "mhap", "minced", "miniasm", "minimap", "mir-prefer", "moab", "moose", "mothur", "mrbayes", "mtl", "nanoflann", "nanopolish", "ncbi-c++-toolkit", "nest", "neuron", "newicktools", "newick-utils", "nextflow", "nixio", "nonpareil", "ntcard", "nxtrim", "ococo", "ome-common", "ome-files", "ome-xml", "openalpr", "openbr", "openimageio", "openmeeg", "openni2", "openni", "orocos-kdl", "orthofinder", "osgearth", "p4est", "pandaseq", "paraview", "parsnp", "pastix", "petsc", "phlawd", "phyx", "pilon", "plink2", "plr", "pnapi", "poretools", "prank", "prodigal", "prokka", "psmc", "pulseview", "pykep", "pymol", "qr_mumps", "quaff", "quicktree", "racon", "rampart", "raxml", "ray", "rcorrector", "rml-mmc", "rna-star", "ropebwt2", "ropebwt", "rstudio-server", "sally", "salmon", "sambamba", "samblaster", "sara", "scram", "sdsl-lite", "seqan", "seqtk", "sga", "sickle", "sigrok-cli", "simpleitk", "siril", "sisl", "skewer", "snap-aligner", "snp-sites", "soapdenovo", "sollya", "sortmerna", "sratoolkit", "statismo", "stringtie", "swarm", "swrcfit", "symengine", "tabtk", "tamarin-prover", "t-coffee", "tmv-cpp", "trans-abyss", "transdecoder", "transrate-tools", "trilinos", "trimal", "trinity", "ucsc-genome-browser", "unicycler", "uniqtag", "uproc", "vague", "vcfanno", "vcflib", "vcftools", "velvetoptimiser", "velvet", "vigra", "vislcg3", "vsearch", "vt", "wfdb", "wiggletools", "wopr", "xbyak", "xcdf", "yaggo")

formulae2 <- c ("a5", "abacas", "abinit", "abyss-explorer", "abyss", "acado", "ace-corrector", "adam", "adapterremoval", "adol-c", "afra", "alembic", "alglib", "alien-hunter", "allpaths-lg", "alpscore", "amos", "analysis", "andi", "ann", "anvio", "apophenia", "aragorn", "arb", "arcs", "aribas", "arow++", "arrayfire", "artemis", "art", "ascii_plots", "astral", "astrometry-net", "ate-pairing", "atomic-pseudopotential-engine", "atompaw", "atpdec", "atram", "augustus", "bact", "bali-phy", "bam2wig", "bamhash", "bamm", "bam-readcount", "bamtools", "bamutil", "barrnap", "bayestraits", "bbtools", "bcalm", "bcftools", "beagle", "beast2", "beast", "bedops", "bedtools", "beetl", "bfc", "bioawk", "biobloomtools", "biocgal", "biointerchange", "biomake", "biopieces", "biopp", "bitseq", "blasr", "blast", "blat", "blaze-lib", "bless", "blis", "boost-compute", "bowtie2", "bowtie", "bpel2owfn", "bpipe", "breseq", "busco", "butterflow", "bwa", "bwtdisk", "calculix-ccx", "cantera", "canu", "cap3", "ccfits", "cddlib", "cd-hit", "cdo", "cdsclient", "cegma", "celera-assembler", "centrifuge", "cerulean", "cgns", "circlator", "circos", "ckon", "clark", "clips", "clonalframeml", "clonehd", "clustal-omega", "clustal-w", "cmdstan", "cmor", "coinmp", "colpack", "concorde", "corset", "cp2k", "crfsuite", "crlibm", "cryptoverif", "cuba", "cube", "cufflinks", "cusp", "cutadapt", "cvblob", "cytoscape", "dadadodo", "daligner", "data-science-toolbox", "dazz_db", "dealii", "deeplearning4j-cli", "delly", "des", "dextractor", "dgtal", "diamond", "dida", "discovardenovo", "discovar", "dl_poly_classic", "dotwrp", "ds9", "dsdp", "dsk", "dssp", "dwgsim", "dynare", "ea-utils", "edena", "edirect", "einspline", "elemental", "elph", "emboss", "e-mem", "enblend-enfuse", "ensembl-tools", "ess", "etsf_io", "exabayes", "exonerate", "express", "fann")
formulae2 <- c(formulae2, c("fasta", "fastml", "fastqc", "fastq-tools", "fasttree", "fastuniq", "fastx_toolkit", "fcgene", "fermi2", "fermikit", "fermi-lite", "fermi", "fgsl", "field3d", "flash", "flexbar", "flint", "flux-simulator", "fplll", "fqzcomp", "freebayes", "freec", "fsa", "fwdpp", "g2o", "gaemr", "galfit", "galib", "galsim", "gap", "garli", "gatb", "gatk", "gdcm", "geant4", "geda-gaf", "geneid", "genet", "genewise", "genometools", "getdp", "gfan", "ggobi", "giira", "gingr", "glimmer3", "glimmerhmm", "glpk448", "gmap-gsnap", "gmcloser", "gmtk", "gnuastro", "gnudatalanguage", "grabix", "graphlan", "graph-tool", "gtsam", "gubbins", "h5utils", "harry", "harvest-tools", "hdf4", "healpix", "hisat2", "hisat", "hlaminer", "hmmer2", "hmmer", "hopdm", "hotspot", "htsbox", "htslib", "humann2", "hyphy", "idba", "idcoefs", "igv", "igvtools", "impute2", "infernal", "insighttoolkit", "inter-views", "ipopt", "iqtree", "itensor", "itsol", "iva", "jblas", "jellyfish-1.1", "jellyfish", "jmol", "joinx", "jsbsim", "k8", "kaiju", "kalign", "kallisto", "kat", "kent-tools", "kissplice", "kmacs", "kmc", "kmergenie", "kmerstream", "kollector", "kraken", "lammps", "lapack-manpages", "last", "lastz", "libbigwig", "libbi", "libbuddy", "libccd", "libcerf", "libctl", "libdivsufsort", "libeemd", "libfolia", "libgpkg", "liblbfgs", "libminc", "libpll", "libsbml", "libsbol", "libsequence", "libsigrokdecode", "lie", "lighter", "lightstringgraph", "links-scaffolder", "lis", "littler", "lmfit", "lmod", "lobstr", "lola", "lp_solve", "lrsim", "lsd", "lumpy-sv", "m4ri", "macse", "madlib", "mad-x", "mafft", "maker", "mallet", "mantaflow", "mapsembler2", "maq", "mash", "masurca", "mathgl", "mathomatic", "matplotlib", "maude", "mbsystem", "mcl", "med-file", "megahit"))
formulae2 <- c(formulae2, c("megam", "meme", "meraculous", "metaphlan", "methpipe", "metis4", "mfem", "mfusg", "mhap", "minced", "minia", "miniasm", "minimap", "mira", "mir-prefer", "mitofy", "mlpack", "mlst", "moab", "molden", "moose", "mothur", "mpsolve", "mrbayes", "mrfast", "msieve", "mtl", "multi-worm-tracker", "mummer", "mumps", "muscle", "nanoflann", "nanopolish", "nauty", "ncbi-c++-toolkit", "nccmp", "ncl", "nest", "neuron", "newicktools", "newick-utils", "nextflow", "nexusformat", "nfft", "nglib", "niftilib", "nip2", "nixio", "nonpareil", "novoalign", "ntcard", "numdiff", "nusmv", "nxtrim", "oases", "oce", "ococo", "ogdraw", "oma", "omcompiler", "ome-common", "ome-files", "ome-xml", "openalpr", "openbr", "opencascade", "opencollada", "openfst", "opengrm-ngram", "opengrm-thrax", "openimageio", "openmeeg", "openni2", "openni", "orocos-kdl", "orthofinder", "osgearth", "oswitch", "p4est", "paml", "pandaseq", "parallel-netcdf", "paraview", "parmetis", "parsnp", "pastix", "pathd8", "pathvisio", "paxtools", "pbsuite", "pcap", "pear", "perf", "petsc", "phipack", "phlawd", "phylip", "phyml", "phyutility", "phyx", "picard-tools", "pilercr", "piler", "pilon", "plasma", "platypusvar", "plink2", "plink", "plr", "pnapi", "poa", "pocl", "populations", "poretools", "prank", "primer3", "prodigal", "prokka", "prooftree", "proteinortho", "proverif", "psmc", "pspp", "pulseview", "pykep", "pymol", "qcl", "qr_mumps", "qsopt_ex", "qsopt", "quaff", "quake", "qualimap", "quast", "quest", "quicktree", "quip", "quorum", "r8s", "racon", "radx", "rainbow", "rampart", "rapsearch2", "rate4site", "raxml", "ray", "rcorrector", "readseq", "readsim", "reapr", "recon", "repeatmasker"))
formulae2 <- c(formulae2, c("repeatmodeler", "repeatscout", "rmblast", "rml-mmc", "rnammer", "rna-star", "ropebwt2", "ropebwt", "rstudio-server", "sailfish", "sais", "sally", "salmon", "salt", "sambamba", "samblaster", "samtools@0.1", "samtools", "sara", "sbagen", "scamp", "scarpa", "scotch5", "scotch", "scram", "scrm", "sdsl-lite", "seqan", "seqdb", "seq-gen", "seqtk", "sequel", "sextractor", "sfscode", "sga", "shark", "shogun", "shrimp", "sickle", "sigrok-cli", "silo", "simpleitk", "simulate-pcr", "siril", "sisl", "skewer", "slepc", "slicot", "sllib", "smalt", "smrtanalysis", "snap-aligner", "snap", "snid", "snoscan", "snpeff", "snp-sites", "soapdenovo", "sollya", "soplex", "sortmerna", "spaced", "spades", "spatialite-gis", "spici", "squeezambler", "sratoolkit", "ssake", "stacks", "statismo", "stiff", "stringtie", "sumo", "superlu43", "superlu_dist", "superlu_mt", "swarm", "swetest", "swrcfit", "symengine", "symphony", "tabtk", "tagdust", "tamarin-prover", "tasr", "tbl2asn", "t-coffee", "tetgen", "therion", "ticcutils", "timbl", "tisean", "tmv-cpp", "topcat", "tophat", "trans-abyss", "transdecoder", "transpose", "trans_proteomic_pipeline", "transrate-tools", "transtermhp", "trf", "triangle", "trilinos", "trimadap", "trimal", "trimmomatic", "trinity", "trnascan", "ucsc-genome-browser", "ucto", "unafold", "unicycler", "uniqtag", "uproc", "utgb", "vague", "varscan", "vcake", "vcfanno", "vcflib", "vcftools", "velvetoptimiser", "velvet", "viennarna", "vigra", "vislcg3", "visp", "vsearch", "vt", "wcalc", "wcslib", "wcstools", "weblogo", "wfdb", "wiggletools", "wopr", "xbyak", "xcdf", "xfig", "xmgredit", "xmi-msim", "xraylib", "xrmc", "xylib", "yaggo", "yaha", "yass", "yeppp", "yices", "zoltan"))

fhomepages <- c("https://sourceforge.net/projects/ngopt/", "https://abacas.sourceforge.io/", "http://www.abinit.org", "http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer", "http://www.bcgsc.ca/platform/bioinfo/software/abyss", "https://acadohub.io/index.html", "/Sheikhizadeh/ACE", "/bigdatagenomics/adam", "/MikkelSchubert/adapterremoval", "https://projects.coin-or.org/ADOL-C", "/EvolBioInf/afra", "http://alembic.io", "http://www.alglib.net", "https://www.sanger.ac.uk/science/tools/alien-hunter", "http://www.broadinstitute.org/software/allpaths-lg/blog/", "http://alpscore.org", "https://sourceforge.net/projects/amos/", "/molpopgen/analysis", "/EvolBioInf/andi", "http://www.cs.umd.edu/~mount/ANN/", "http://merenlab.org/projects/anvio/", "http://apophenia.info/", "http://mbio-serv2.mbioekol.lu.se/ARAGORN/", "http://fredrikj.net/arb/index.html", "/bcgsc/arcs", "http://www.mathematik.uni-muenchen.de/~forster/sw/aribas.html", "https://code.google.com/p/arowpp/", "http://arrayfire.com", "https://www.sanger.ac.uk/science/tools/artemis", "https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm", "/dzerbino/ascii_plots", "/smirarab/ASTRAL", "http://astrometry.net")
fhomepages <- c(fhomepages, c( "http://homepage1.nifty.com/herumi/crypt/ate-pairing.html", "http://www.tddft.org/programs/APE/", "http://users.wfu.edu/natalie/papers/pwpaw/man.html", "https://atpdec.sourceforge.io/", "/juliema/aTRAM", "http://bioinf.uni-greifswald.de/augustus/", "http://chasen.org/~taku/software/bact/", "http://www.bali-phy.org/", "http://www.epigenomes.ca/tools-and-software", "/DecodeGenetics/BamHash", "http://bamm-project.org", "/genome/bam-readcount", "/pezmaster31/bamtools", "http://genome.sph.umich.edu/wiki/BamUtil", "/tseemann/barrnap", "http://www.evolution.rdg.ac.uk/BayesTraitsV3/BayesTraitsV3.html", "https://bbmap.sourceforge.io/", "/GATB/bcalm", "http://www.htslib.org/", "/beagle-dev/beagle-lib", "https://www.beast2.org/", "http://beast.bio.ed.ac.uk/", "/bedops/bedops", "/arq5x/bedtools2", "/BEETL/BEETL", "/lh3/bfc", "/lh3/bioawk", "http://www.bcgsc.ca/platform/bioinfo/software/biobloomtools/", "http://bio.math.berkeley.edu/cgal/", "https://www.codamono.com/biointerchange/", "/evoldoers/biomake", "/maasha/biopieces", "http://biopp.univ-montp2.fr", "https://bitseqhub.io/", "/PacificBiosciences/blasr"))
fhomepages <- c(fhomepages, c("http://blast.ncbi.nlm.nih.gov/", "https://genome.ucsc.edu/FAQ/FAQblat.html", "https://bitbucket.org/blaze-lib/blaze/", "https://sourceforge.net/projects/bless-ec/", "/flame/blis", "https://boostorghub.io/compute", "https://bowtie-bio.sourceforge.io/", "https://bowtie-bio.sourceforge.io/", "https://www.gnu.org/software/bpel2owfn", "/ssadedin/bpipe", "http://barricklab.org/twiki/bin/view/Lab/ToolsBacterialGenomeResequencing", "http://busco.ezlab.org", "/dthpham/butterflow", "/lh3/bwa", "http://people.unipmn.it/manzini/bwtdisk/", "http://www.calculix.de/", "/Cantera/cantera", "https://canu.readthedocs.org/en/latest/", "http://seq.cs.iastate.edu/cap3.html", "http://heasarc.gsfc.nasa.gov/fitsio/CCfits/", "http://www.inf.ethz.ch/personal/fukudak/cdd_home/", "http://cd-hit.org", "https://code.zmaw.de/projects/cdo", "http://cdsarc.u-strasbg.fr/doc/cdsclient.html", "http://korflab.ucdavis.edu/datasets/cegma/", "https://wgs-assembler.sourceforge.io/", "http://www.ccb.jhu.edu/software/centrifuge", "https://sourceforge.net/projects/ceruleanassembler/", "http://cgns.org/", "https://sanger-pathogenshub.io/circlator/", "http://circos.ca", "https://tschaumehub.io/ckon/", "http://clark.cs.ucr.edu/", "http://www.clipsrules.net", "/xavierdidelot/ClonalFrameML", "https://www.sanger.ac.uk/science/tools/clonehd", "http://www.clustal.org/omega/", "http://www.clustal.org/clustal2/", "http://mc-stan.org/", "https://cmor.llnl.gov/", "https://projects.coin-or.org/CoinMP", "http://cscapes.cs.purdue.edu/coloringpage", "http://www.math.uwaterloo.ca/tsp/concorde/index.html", "/Oshlack/Corset/wiki", "https://www.cp2k.org", "http://www.chokkan.org/software/crfsuite", "http://lipforge.ens-lyon.fr/www/crlibm/", "http://cryptoverif.inria.fr", "http://www.feynarts.de/cuba", "http://apps.fz-juelich.de/scalasca/", "https://cole-trapnell-labhub.io/cufflinks/", "http://cusplibraryhub.io", "/marcelm/cutadapt", "https://code.google.com/p/cvblob/", "http://www.cytoscape.org/", "http://www.jwz.org/dadadodo/", "/thegenemyers/DALIGNER", "/jeroenjanssens/data-science-toolbox", "/thegenemyers/DAZZ_DB", "https://www.dealii.org", "http://deeplearning4j.org/", "/tobiasrausch/delly", "https://des.sourceforge.io", "/thegenemyers/DEXTRACTOR", "http://dgtal.org", "http://ab.inf.uni-tuebingen.de/software/diamond/", "http://www.bcgsc.ca/platform/bioinfo/software/dida", "https://www.broadinstitute.org/software/discovar/blog/", "https://www.broadinstitute.org/software/discovar/blog/", "https://ccpforge.cse.rl.ac.uk/gf/project/dl_poly_classic/", "/tenomoto/dotwrp", "http://ds9.si.edu/", "http://www.mcs.anl.gov/hs/software/DSDP/", "http://minia.genouest.org/dsk/", "http://swift.cmbi.ru.nl/gv/dssp/", "/nh13/DWGSIM", "https://www.dynare.org", "https://code.google.com/p/ea-utils/", "http://www.genomic.ch/edena.php", "http://www.ncbi.nlm.nih.gov/books/NBK179288/", "https://einspline.sourceforge.io/", "http://libelemental.org/", "http://cbcb.umd.edu/software/ELPH/", "https://emboss.sourceforge.io/", "http://www.csd.uwo.ca/~ilie/E-MEM/", "https://enblend.sourceforge.io/", "http://www.ensembl.org/info/docs/tools/index.html", "http://ess.r-project.org/", "http://www.etsf.eu/resources/software/libraries_and_tools", "http://sco.h-its.org/exelixis/web/software/exabayes/", "http://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate", "http://bio.math.berkeley.edu/eXpress/", "http://leenissen.dk/fann/wp/"))
fhomepages <- c(fhomepages, c("http://faculty.virginia.edu/wrpearson/fasta/", "http://fastml.tau.ac.il/source.php", "http://www.bioinformatics.babraham.ac.uk/projects/fastqc/", "http://homes.cs.washington.edu/~dcjones/fastq-tools/", "http://microbesonline.org/fasttree/", "http://sourceforge.net/projects/fastuniq/", "http://hannonlab.cshl.edu/fastx_toolkit/", "http://sourceforge.net/projects/fcgene/", "/lh3/fermi2", "/lh3/fermikit", "/lh3/fermi-lite", "/lh3/fermi", "https://www.lrz.de/services/software/mathematik/gsl/fortran/", "https://sites.google.com/site/field3d/", "http://ccb.jhu.edu/software/FLASH/", "/seqan/flexbar", "http://flintlib.org", "http://sammeth.net/confluence/display/SIM/Home", "/fplll/fplll", "https://sourceforge.net/projects/fqzcomp/", "/ekg/freebayes", "http://bioinfo.curie.fr/projects/freec/", "http://fsa.sourceforge.net/", "https://molpopgenhub.io/fwdpp/", "http://openslam.org/g2o.html", "http://www.broadinstitute.org/software/gaemr/", "http://users.obs.carnegiescience.edu/peng/work/galfit/galfit.html", "http://lancet.mit.edu/ga/", "/GalSim-developers/GalSim", "https://www.gap-system.org/", "https://code.google.com/p/garli/", "https://gatb.inria.fr/", "https://software.broadinstitute.org/gatk/", "https://sourceforge.net/projects/gdcm/", "http://geant4.cern.ch", "http://www.geda-project.org/", "http://genome.crg.es/software/geneid/", "http://www.lsi.upc.edu/~jcarmona/genet.html", "https://www.ebi.ac.uk/~birney/wise2/", "http://genometools.org/", "http://www.geuz.org/getdp/", "http://home.imf.au.dk/jensen/software/gfan/gfan.html", "http://www.ggobi.org", "https://sourceforge.net/projects/giira/", "/marbl/harvest/blob/master/docs/content/gingr.rst", "http://ccb.jhu.edu/software/glimmer/index.shtml", "http://ccb.jhu.edu/software/glimmerhmm/", "https://www.gnu.org/software/glpk/", "http://research-pub.gene.com/gmap/", "https://sourceforge.net/projects/gmcloser/", "http://melodi.ee.washington.edu/gmtk", "https://www.gnu.org/software/gnuastro/index.html", "https://gnudatalanguage.sourceforge.io/", "/arq5x/grabix", "https://bitbucket.org/nsegata/graphlan/wiki/Home", "https://graph-tool.skewed.de/", "https://bitbucket.org/gtborg/gtsam/", "/sanger-pathogens/gubbins", "http://ab-initio.mit.edu/wiki/index.php/H5utils", "http://www.mlsec.org/harry", "/marbl/harvest-tools", "https://www.hdfgroup.org", "http://healpix.jpl.nasa.gov", "http://ccb.jhu.edu/software/hisat2/", "http://ccb.jhu.edu/software/hisat/", "http://www.bcgsc.ca/platform/bioinfo/software/hlaminer", "http://hmmer.janelia.org/", "http://hmmer.janelia.org", "http://www.maths.ed.ac.uk/~gondzio/software/hopdm.html", "/rthurman/hotspot", "/lh3/htsbox", "http://www.htslib.org/", "https://huttenhower.sph.harvard.edu/humann", "http://www.hyphy.org/", "http://i.cs.hku.hk/~alse/hkubrg/projects/idba/", "https://code.google.com/p/idcoefs/", "https://www.broadinstitute.org/software/igv", "https://www.broadinstitute.org/software/igv", "https://mathgen.stats.ox.ac.uk/impute/impute_v2.html", "http://eddylab.org/infernal/", "https://www.itk.org", "http://www.neuron.yale.edu/neuron/", "https://projects.coin-or.org/Ipopt", "http://www.iqtree.org/", "http://itensor.org/", "http://www-users.cs.umn.edu/~saad/software/ITSOL", "/sanger-pathogens/iva", "https://mikiobraunhub.io/jblas", "http://www.cbcb.umd.edu/software/jellyfish/", "http://www.genome.umd.edu/jellyfish.html", "http://www.jmol.org", "http://gmt.genome.wustl.edu/joinx", "https://jsbsim.sourceforge.io/", "/attractivechaos/k8", "http://kaiju.binf.ku.dk/", "http://msa.sbc.su.se/", "https://pachterlabhub.io/kallisto/", "/TGAC/KAT", "http://genome.ucsc.edu/", "http://kissplice.prabi.fr", "http://kmacs.gobics.de/", "http://sun.aei.polsl.pl/kmc/", "http://kmergenie.bx.psu.edu/", "/pmelsted/KmerStream", "/bcgsc/kollector", "https://ccb.jhu.edu/software/kraken/", "http://lammps.sandia.gov", "http://netlib.org/lapack/"))
fhomepages <- c(fhomepages, c("http://last.cbrc.jp/", "https://www.bx.psu.edu/~rsharris/lastz/", "/dpryan79/libBigWig", "http://libbi.org", "https://sourceforge.net/projects/buddy/", "http://libccd.danfis.cz", "http://apps.jcns.fz-juelich.de/doku/sc/libcerf", "http://ab-initio.mit.edu/wiki/index.php/Libctl", "/y-256/libdivsufsort", "https://bitbucket.org/luukko/libeemd", "https://proyconhub.io/folia/", "https://bitbucket.org/luciad/libgpkg", "http://www.chokkan.org/software/liblbfgs", "https://en.wikibooks.org/wiki/MINC", "http://www.libpll.org/", "http://sbml.org/Software/libSBML", "https://synbiodexhub.io/libSBOL", "https://molpopgenhub.io/libsequence/", "https://sigrok.org/", "http://wwwmathlabo.univ-poitiers.fr/~maavl/LiE/", "/mourisl/Lighter", "http://lsg.algolab.eu/", "http://www.bcgsc.ca/platform/bioinfo/software/links", "http://www.ssisc.org/lis", "http://dirk.eddelbuettel.com/code/littler.html", "http://apps.jcns.fz-juelich.de/doku/sc/lmfit", "https://www.tacc.utexas.edu/research-development/tacc-projects/lmod", "http://lobstr.teamerlich.org", "http://service-technology.org/lola/", "http://sourceforge.net/projects/lpsolve/", "/aquaskyline/LRSIM", "/tothuhien/lsd-0.3beta", "/arq5x/lumpy-sv", "https://bitbucket.org/malb/m4ri", "http://bioweb.supagro.inra.fr/macse/", "https://madlib.incubator.apache.org/", "https://cern.ch/mad", "http://mafft.cbrc.jp/alignment/software/index.html", "http://www.yandell-lab.org/software/maker.html", "http://mallet.cs.umass.edu/", "http://mantaflow.com/", "https://colibread.inria.fr/software/mapsembler2/", "https://maq.sourceforge.io/", "/marbl/Mash", "http://www.genome.umd.edu/masurca.html", "https://mathgl.sourceforge.io/", "/mfillpot/mathomatic", "https://matplotlib.org", "http://maude.cs.illinois.edu", "http://www.mbari.org/data/mbsystem/mb-cookbook/index.html", "http://micans.org/mcl", "http://www.salome-platform.org", "/voutcn/megahit", "http://www.umiacs.umd.edu/~hal/megam/", "http://meme-suite.org", "http://jgi.doe.gov/data-and-tools/meraculous/", "http://huttenhower.sph.harvard.edu/metaphlan", "http://smithlabresearch.org/software/methpipe/", "http://glaros.dtc.umn.edu/gkhome/views/metis", "http://www.mfem.org", "http://water.usgs.gov/ogw/mfusg", "/marbl/MHAP", "/ctSkennerton/minced", "http://minia.genouest.org/", "/lh3/miniasm", "/lh3/minimap", "http://sourceforge.net/projects/mira-assembler", "/hangelwen/miR-PREFeR", "http://dogma.ccbb.utexas.edu/mitofy/", "http://www.mlpack.org", "/tseemann/mlst", "http://press3.mcs.anl.gov/sigma/moab-library/", "http://www.cmbi.ru.nl/molden/", "http://moose.ncbs.res.in", "https://www.mothur.org/", "http://www.dm.unipi.it/cluster-pages/mpsolve/index.htm", "https://mrbayes.sourceforge.io/", "https://mrfast.sourceforge.io/", "https://sourceforge.net/projects/msieve/", "http://www.simunova.com", "http://sourceforge.net/projects/mwt/", "https://mummer.sourceforge.io/", "http://mumps-solver.org", "http://www.drive5.com/muscle/", "https://jlblancochub.io/nanoflann/", "/jts/nanopolish", "http://cs.anu.edu.au/~bdm/nauty/", "http://www.ncbi.nlm.nih.gov/toolkit/", "https://nccmp.sourceforge.io/", "https://ncl.sourceforge.io/", "http://www.nest-simulator.org/", "http://www.neuron.yale.edu/neuron/", "/lindenb/newicktools", "http://cegg.unige.ch/newick_utils", "https://www.nextflow.io/", "http://www.nexusformat.org", "http://www-user.tu-chemnitz.de/~potts/nfft", "https://sourceforge.net/projects/netgen-mesher/", "https://niftilib.sourceforge.io/", "http://www.vips.ecs.soton.ac.uk/", "http://www.g-node.org/nix", "http://enve-omics.ce.gatech.edu/nonpareil", "http://www.novocraft.com/", "/bcgsc/ntCard", "http://www.nongnu.org/numdiff", "http://nusmv.fbk.eu", "/sequencing/NxTrim", "http://www.ebi.ac.uk/~zerbino/oases/", "/tpaviot/oce", "/karel-brinda/ococo", "http://ogdraw.mpimp-golm.mpg.de/", "http://omabrowser.org/standalone/", "https://www.openmodelica.org", "https://www.openmicroscopy.org/site/products/ome-files-cpp/"))
fhomepages <- c(fhomepages, c("https://www.openmicroscopy.org/site/products/ome-files-cpp/", "https://www.openmicroscopy.org/site/products/ome-files-cpp/", "/openalpr/openalpr", "http://www.openbiometrics.org/", "https://dev.opencascade.org/", "http://www.opencollada.org", "http://www.openfst.org/", "http://www.openfst.org/twiki/bin/view/GRM/NGramLibrary", "http://www.openfst.org/twiki/bin/view/GRM/Thrax", "http://openimageio.org", "https://openmeeghub.io", "https://structure.io/openni", "http://www.openni.org/", "http://www.orocos.org/kdl", "/davidemms/OrthoFinder", "http://osgearth.org", "/yeban/oswitch", "http://www.p4est.org", "http://abacus.gene.ucl.ac.uk/software/paml.html", "/neufeld/pandaseq", "https://trac.mcs.anl.gov/projects/parallel-netcdf", "http://paraview.org", "http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview", "/marbl/parsnp", "http://pastix.gforge.inria.fr", "http://www2.math.su.se/PATHd8/", "http://www.pathvisio.org/", "http://www.biopax.org/paxtools/", "http://sourceforge.net/projects/pb-jelly/", "http://seq.cs.iastate.edu/pcap.html", "http://www.exelixis-lab.org/pear", "http://osmot.cs.cornell.edu/kddcup/software.html", "https://www.mcs.anl.gov/petsc/index.html", "http://www.maths.otago.ac.nz/~dbryant/software.html", "http://www.phlawd.net/", "http://evolution.genetics.washington.edu/phylip.html", "http://www.atgc-montpellier.fr/phyml/", "http://blackrim.org/programs/phyutility/", "/FePhyFoFum/phyx", "https://broadinstitutehub.io/picard/", "http://drive5.com/pilercr/", "http://drive5.com/piler/", "/broadinstitute/pilon/wiki", "http://icl.cs.utk.edu/plasma", "http://www.well.ox.ac.uk/platypus", "https://www.cog-genomics.org/plink2", "http://zzz.bwh.harvard.edu/plink/", "https://www.joeconway.com/plr.html", "http://home.gna.org/service-tech/pnapi", "https://sourceforge.net/projects/poamsa/", "https://pocl.sourceforge.io/", "http://bioinformatics.org/~tryphon/populations/", "https://poretools.readthedocs.org", "http://wasabiapp.org/software/prank/", "https://primer3.sourceforge.io/", "http://prodigal.ornl.gov/", "http://www.vicbioinformatics.com/software.prokka.shtml", "http://askra.de/software/prooftree", "http://www.bioinf.uni-leipzig.de/Software/proteinortho/", "http://prosecco.gforge.inria.fr/personal/bblanche/proverif", "/lh3/psmc", "https://www.gnu.org/software/pspp/", "https://sigrok.org/", "https://esahub.io/pykep/", "http://pymol.org", "http://tph.tuwien.ac.at/~oemer/qcl.html", "http://buttari.perso.enseeiht.fr/qr_mumps", "http://www.math.uwaterloo.ca/~bico/qsopt/ex/", "http://www.math.uwaterloo.ca/~bico/qsopt/index.html", "/ihh/quaff", "http://www.cbcb.umd.edu/software/quake/", "http://qualimap.bioinfo.cipf.es/", "http://cab.spbu.ru/software/quast/", "http://www-hsc.usc.edu/~valouev/QuEST/QuEST.html", "https://www.sanger.ac.uk/resources/software/quicktree/", "http://homes.cs.washington.edu/~dcjones/quip/", "http://www.genome.umd.edu/quorum.html", "http://ceiba.biosci.arizona.edu/r8s/", "/isovic/racon", "https://www.ral.ucar.edu/projects/titan/docs/radial_formats/radx.html", "https://sourceforge.net/projects/bio-rainbow/", "/TGAC/RAMPART", "https://rapsearch2.sourceforge.io/", "http://www.tau.ac.il/~itaymay/cp/rate4site.html", "https://sco.h-its.org/exelixis/web/software/raxml/index.html", "https://denovoassembler.sourceforge.io/", "/mourisl/Rcorrector", "http://iubio.bio.indiana.edu/soft/molbio/readseq/java/", "https://sourceforge.net/p/readsim/wiki/Home/", "https://www.sanger.ac.uk/science/tools/reapr", "http://www.repeatmasker.org/RepeatModeler.html", "http://www.repeatmasker.org/", "http://www.repeatmasker.org/RepeatModeler.html", "http://bix.ucsd.edu/repeatscout/", "http://www.repeatmasker.org/RMBlast.html", "http://www.ida.liu.se/labs/pelab/rml", "http://www.cbs.dtu.dk/services/RNAmmer/", "/alexdobin/STAR", "/lh3/ropebwt2", "/lh3/ropebwt", "https://www.rstudio.com", "http://www.cs.cmu.edu/~ckingsf/software/sailfish", "https://sites.google.com/site/yuta256/"))
fhomepages <- c(fhomepages, c("http://www.mlsec.org/sally", "/COMBINE-lab/salmon", "http://supernovae.in2p3.fr/salt/doku.php?id=start", "https://lomereiterhub.io/sambamba", "/GregoryFaust/samblaster", "https://samtools.sourceforge.io/", "http://www.htslib.org/", "http://service-technology.org/sara", "http://uazu.net/sbagen/", "http://www.astromatic.net/software/scamp", "http://compbio.cs.toronto.edu/hapsembler/scarpa.html", "https://gforge.inria.fr/projects/scotch", "https://gforge.inria.fr/projects/scotch", "https://scram-pra.org", "https://scrmhub.io/", "/simongog/sdsl-lite", "http://www.seqan.de/", "https://bitbucket.org/mhowison/seqdb", "http://tree.bio.ed.ac.uk/software/seqgen/", "/lh3/seqtk", "http://bix.ucsd.edu/SEQuel/index.html", "http://www.astromatic.net/software/sextractor", "https://sfscode.sourceforge.io/", "/jts/sga", "http://image.diku.dk/shark/", "http://www.shogun-toolbox.org", "http://compbio.cs.toronto.edu/shrimp/", "/najoshi/sickle", "https://sigrok.org/", "https://wci.llnl.gov/simulation/computer-codes/silo", "http://www.simpleitk.org", "https://sourceforge.net/projects/simulatepcr/", "http://free-astro.org/index.php/Siril", "http://www.sintef.no/Informasjons--og-kommunikasjonsteknologi-IKT/Anvendt-matematikk/Fagomrader/Geometri/Prosjekter/The-SISL-Nurbs-Library/SISL-Homepage/", "/relipmoc/skewer", "http://www.grycap.upv.es/slepc", "http://www.slicot.org", "http://www.ir.isas.jaxa.jp/~cyamauch/sli/index.html", "http://www.sanger.ac.uk/science/tools/smalt-0", "http://www.pacb.com/products-and-services/analytical-software/smrt-analysis/", "http://snap.cs.berkeley.edu", "http://korflab.ucdavis.edu/software.html", "https://people.lam.fr/blondin.stephane/software/snid", "http://lowelab.ucsc.edu/snoscan/", "https://snpeff.sourceforge.io/", "/sanger-pathogens/snp-sites", "http://soap.genomics.org.cn/soapdenovo.html", "http://sollya.gforge.inria.fr/", "http://soplex.zib.de", "http://bioinfo.lifl.fr/RNA/sortmerna/", "http://spaced.gobics.de/", "http://bioinf.spbau.ru/spades/", "https://www.gaia-gis.it/fossil/spatialite_gis/index", "http://compbio.cs.princeton.edu/spici/", "http://chitsazlab.org/software/squeezambler/", "/ncbi/sra-tools", "http://www.bcgsc.ca/platform/bioinfo/software/ssake", "https://creskolab.uoregon.edu/stacks/", "/statismo/statismo", "https://www.astromatic.net/software/stiff", "https://ccb.jhu.edu/software/stringtie", "https://sourceforge.net/projects/sumo/", "http://crd-legacy.lbl.gov/~xiaoye/SuperLU/", "http://crd-legacy.lbl.gov/~xiaoye/SuperLU/", "http://crd-legacy.lbl.gov/~xiaoye/SuperLU", "/torognes/swarm", "https://www.astro.com/swisseph/", "https://swrcfit.sourceforge.io/", "/symengine/symengine", "http://www.coin-or.org/projects/SYMPHONY.xml", "/lh3/tabtk", "https://tagdust.sourceforge.io/", "https://tamarin-proverhub.io/", "http://www.bcgsc.ca/platform/bioinfo/software/tasr", "https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/", "http://www.tcoffee.org/", "http://wias-berlin.de/software/tetgen/", "https://therion.speleo.sk", "https://ilk.uvt.nl/ticcutils/", "https://ilk.uvt.nl/timbl/", "http://www.mpipks-dresden.mpg.de/~tisean/", "/rmjarvis/tmv"))
fhomepages <- c(fhomepages, c("http://www.star.bris.ac.uk/~mbt/topcat/", "https://ccb.jhu.edu/software/tophat", "http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss", "https://transdecoderhub.io/", "https://transpose.sourceforge.io/", "http://tools.proteomecenter.org/wiki/index.php?title=Software:TPP", "/Blahah/transrate-tools", "http://transterm.cbcb.umd.edu/", "https://tandem.bu.edu/trf/trf.html", "http://www.cs.cmu.edu/~quake/triangle.html", "http://trilinos.sandia.gov", "/lh3/trimadap", "http://trimal.cgenomics.org/", "http://www.usadellab.org/cms/?page=trimmomatic", "https://trinityrnaseqhub.io", "http://eddylab.org/software.html", "http://genome.ucsc.edu", "https://ilk.uvt.nl/ucto/", "http://mfold.rna.albany.edu/", "/rrwick/Unicycler", "/sjackman/uniqtag", "http://uproc.gobics.de/", "http://utgenome.org/", "http://www.vicbioinformatics.com/software.vague.shtml", "https://dkoboldthub.io/varscan/", "https://vcake.sourceforge.io/", "/brentp/vcfanno", "/ekg/vcflib", "https://vcftoolshub.io/", "http://bioinformatics.net.au/software.velvetoptimiser.shtml", "http://www.ebi.ac.uk/~zerbino/velvet/", "http://www.tbi.univie.ac.at/~ronny/RNA/", "https://ukoethehub.io/vigra/", "https://beta.visl.sdu.dk/cg3.html", "https://visp.inria.fr", "/torognes/vsearch", "http://genome.sph.umich.edu/wiki/Vt", "https://wcalc.sourceforge.io/", "http://www.atnf.csiro.au/people/mcalabre/WCS/", "http://tdc-www.harvard.edu/wcstools/", "http://weblogo.berkeley.edu/", "http://physionet.org/physiotools/", "/Ensembl/WiggleTools", "https://ilk.uvt.nl/wopr", "http://herumi.in.coocan.jp", "/jimbraun/XCDF", "https://mcj.sourceforge.io/", "http://www.stccmop.org/knowledge_transfer/software/selfe/ace_tools", "/tschoonj/xmimsim", "/tschoonj/xraylib", "/golosio/xrmc", "https://xylib.sourceforge.io/", "/gmarcais/yaggo", "/GregoryFaust/yaha", "http://bioinfo.lifl.fr/yass/", "http://www.yeppp.info", "http://yices.csl.sri.com/", "http://www.cs.sandia.gov/Zoltan"))

# Number of watchers from 'head' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(urls)) {
  curlExample <- paste("curl -X GET -u tmozgach:YOUPASSWORD https://api.github.com/repos",urls[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$subscribers_count))
}

tibt <- as_data_frame(setNames(list(formulae,vectorr), c("Formula","Watchers")))

# Number of watchers  from 'homepage' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(fhomepages)) {
  curlExample <- paste("curl -X GET -u tmozgach:yourpassword https://api.github.com/repos",fhomepages[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$subscribers_count))

}

tibt2 <- as_data_frame(setNames(list(formulae2,vectorr), c("Formula","Watchers")))

# Combine
full_join <- full_join(tibt2, tibt, by = "Formula")
full_join$Watchers.x[full_join$Watchers.x=="NULL"] <- "0"
full_join$Watchers.y[full_join$Watchers.y=="NULL"] <- "0"

full_join$Watchers <- with(full_join, ifelse (full_join$Watchers.x == "0",full_join$Watchers.y, full_join$Watchers.x))

full_join <- subset(full_join, select = -c(Watchers.x,Watchers.y))

####
# Number of stars from 'head' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(urls)) {
  curlExample <- paste("curl -X GET -u tmozgach:yourpassword https://api.github.com/repos",urls[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$watchers))
}

tibt <- as_data_frame(setNames(list(formulae,vectorr), c("Formula","Stars")))

# Number of stars  from 'homepage' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(fhomepages)) {
  curlExample <- paste("curl -X GET -u tmozgach:yourpassword https://api.github.com/repos",fhomepages[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$watchers))

}

tibt2 <- as_data_frame(setNames(list(formulae2,vectorr), c("Formula","Stars")))

# Combine
full_join2 <- full_join(tibt2, tibt, by = "Formula")
full_join2$Stars.x[full_join2$Stars.x=="NULL"] <- "0"
full_join2$Stars.y[full_join2$Stars.y=="NULL"] <- "0"

full_join2$Stars <- with(full_join2, ifelse (full_join2$Stars.x == "0",full_join2$Stars.y, full_join2$Stars.x))
full_join2 <- subset(full_join2, select = -c(Stars.x,Stars.y))
#####
# Number of forks from 'head' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(urls)) {
  curlExample <- paste("curl -X GET -u tmozgach:your password https://api.github.com/repos",urls[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$forks))
}

tibt <- as_data_frame(setNames(list(formulae,vectorr), c("Formula","Forks")))

# Number of forks  from 'homepage' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(fhomepages)) {
  curlExample <- paste("curl -X GET -u tmozgach:password https://api.github.com/repos",fhomepages[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$forks))

}

tibt2 <- as_data_frame(setNames(list(formulae2,vectorr), c("Formula","Forks")))

# Combine
full_join3 <- full_join(tibt2, tibt, by = "Formula")
full_join3$Forks.x[full_join3$Forks.x=="NULL"] <- "0"
full_join3$Forks.y[full_join3$Forks.y=="NULL"] <- "0"

full_join3$Forks <- with(full_join3, ifelse (full_join3$Forks.x == "0",full_join3$Forks.y, full_join3$Forks.x))
full_join3 <- subset(full_join3, select = -c(Forks.x,Forks.y))

#####
# Combine all tables
full_join4 <- full_join(full_join, full_join2, by = "Formula")
full_join5 <- full_join(full_join3, full_join4, by = "Formula")

#####
# whether the formula is notable?
full_join5$Notable <- with(full_join5, ifelse ((full_join5$Forks >= 20 | full_join5$Watchers >= 20 | full_join5$Stars >= 50), "yes", "no"))

#####
# whether the formula has a  tag "bioinformatics"

full_join5$tag <- with(full_join5, ifelse (full_join5$Formula  %in% bio_tag$X1, "yes", "no"))

#####
# finish the number of Linuxbrew installations in the last year (/home/tmozgacheva/science-20170914.csv), dataframe - Linux_inst
#Delete the options e.g --with bla bla
Linux_inst$`Event Action` <- sub("\\ .*","\\ ",Linux_inst$`Event Action`)
Linux_inst$`Event Action` <- trimws(Linux_inst$`Event Action`)
full_join5$Formula <- trimws(full_join5$Formula)

# combine the number of downloading for the same formula
by_formula_linux <- Linux_inst %>% group_by(`Event Action`) %>% 
summarize(linux_stat = sum(`Total Events`))

by_formula_linux$`Event Action` <- trimws(by_formula_linux$`Event Action`)

####
# Integrate Linux statistic to the table
full_join5 <- merge(x = full_join5, y = by_formula_linux, by.x = "Formula", by.y = "Event Action", all.x = TRUE)

#####
# Mac statistic
#Delete the options e.g --with bla bla
science_mac$formula <- sub("\\ .*","\\ ",science_mac$formula)
science_mac$formula <- trimws(science_mac$formula)

# combine the number of downloading for the same formula
by_formula_mac <- science_mac %>% group_by(`formula`) %>% 
  summarize(mac_stat = sum(count))

####
# Integrate Mac statistic to the table
full_join5 <- merge(x = full_join5, y = by_formula_mac, by.x = "Formula", by.y = "formula", all.x = TRUE)

# Sort
full_join5 <- arrange(full_join5,full_join5$Formula)

# Export to TSV file
full_join5 = as.matrix(full_join5)
write.table(full_join5, file='homebrew_science_stat.tsv', quote=FALSE, sep='\t', col.names = NA)
sjackman commented 6 years ago

I'd like up-to-date GitHub data on forks, stars, and watchers. samtools doesn't have any GitHub information because neither its homepage nor head are GitHub urls, but its url is a GitHub URL. Please check all three, homepage, url, and head. Could you please rerun this analysis for Homebrew/science?

tmozgach commented 6 years ago

You are right. My code doesn't deal with such cases. Only with url that has .git in the url. I didn't expect the downloading github link.

url "https://github.com/samtools/samtools/releases/download/1.5/samtools-1.5.tar.bz2"

I will work on that! =)

sjackman commented 6 years ago

Thanks, Tanya.