sjackman commented 7 years ago

Hi, Tanya. Can you please prepare a table that lists…

tmozgach commented 6 years ago

List all formulae in homebrew-science tap:

TAP_PREFIX=$(brew --prefix)/Library/Taps
ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb

URL from the 'head' field in the formula:

brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].head} {|v| v.url}'

Name of formulae with 'head' field:

brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].head} {|v|}'

Name all formulae in the Tap:

brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].name}'

URL from the 'homepage' field:

brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].homepage}'

Find formulae that have bioinformatics tag:

grep -i '# tag "bioinformatics"' $TAP_PREFIX/$TAP/*.rb | xargs -I{} basename {} .rb | grep -o '^[^.]*'

GET the number of GitHub watchers, stars, and forks:

  "subscribers_count": 12   dJSON$subscribers_count
  "watchers": 1,  dJSON$watchers
  "forks": 1, dJSON$forks
sjackman commented 6 years ago

Here's the record of installations for macOS: It includes only the top-1000 most popular packages, including those in Hombrew/core, so it includes only 25 packages in Homebrew/science, only two of which are bioinformatics packages, htslib and samtools.

sjackman commented 6 years ago

This particular file is not very useful for Homebrew/science. In any case, it can be converted to TSV like so:

brew install jq miller
curl -L \
    | jq .items | mlr --ijson --otsvlite cat
tmozgach commented 6 years ago


Change the extension txt to tsv

sjackman commented 6 years ago

Thanks, Tanya!

sjackman commented 6 years ago

@tmozgach Do you have source code for the script that created this file homebrew_science_stat.txt?

tmozgach commented 6 years ago

@sjackman I will post it here, because it is not related ORCA project.

# Date: 22/09/2017
# Author: Tatyana Mozgacheva
# Description: This script generates a table of:
# the number of GitHub watchers, stars, and forks for each formula in Homebrew/science that has a GitHub a repo;
# whether the formula is notable (forks ≥ 20 or watchers ≥ 20 or stars ≥ 50);
# the number of macOS installations in the last year;
# the number of Linuxbrew installations in the last year;
# whether the formula has a # tag "bioinformatics".
# In order to get the information about statistic, the git page is required.
# This page is located in either 'homepage' field or 'head' field in formulae's code.
# Procedure:
# 1) In the terminal, where brew is installed, type:
#    TAP=homebrew/homebrew-science
#    TAP_PREFIX=$(brew --prefix)/Library/Taps
#    *URL from the 'head' field: 
#    brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].head} {|v| v.url}'
#    and copy the output into 'urls' variable
#    *Name of formulae that has 'head' field: 
#    brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].head} {|v|}'
#    and copy the output into 'formulae' variable
#    *Name of all formulae in TAP: 
#    brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].name}'
#    and copy the output into 'formulae2' variable
#    *URL from the 'homepage' field: 
#    brew irb <<< "%w[$(ls $TAP_PREFIX/$TAP/Formula/*.rb 2>/dev/null || ls $TAP_PREFIX/$TAP/*.rb 2>/dev/null | xargs -I{} basename {} .rb)]"'.map { |x| Formula[x].homepage}'
#    and copy the output into 'fhomepages' variable
# 3) Get rid of '', '.git' in order to get name of the repo.
# 4) Type in the terminal:
#    grep -i -l '# tag "bioinformatics"' $TAP_PREFIX/$TAP/*.rb | xargs -I{} basename {} .rb | grep -o '^[^.]*'
#    Copy the output into file: bio_tag.tcv
#    Import Dataset
# 5) Type in the terminal:
#    brew install jq miller
#    cat science-macos-20170928.json | jq .items | mlr --ijson --otsvlite cat
#    Copy the output into file:science-mac.csv
#    Import Dataset
# 4) Run the following script:




urls <- c("/bcgsc/abyss", "/Sheikhizadeh/ACE", "/bigdatagenomics/adam", "", "/EvolBioInf/afra", "/alembic/alembic", "/ALPSCore/ALPSCore", "/merenlab/anvio", "/b-k/apophenia", "/fredrik-johansson/arb", "/bcgsc/arcs", "/dzerbino/ascii_plots", "/dstndstn/", "/herumi/ate-pairing", "/juliema/aTRAM", "/bredelings/BAli-Phy", "/DecodeGenetics/BamHash", "/macroevolution/bamm", "/genome/bam-readcount", "/pezmaster31/bamtools", "/statgen/bamUtil", "/tseemann/barrnap", "/GATB/bcalm", "/beagle-dev/beagle-lib", "/CompEvol/beast2", "/beast-dev/beast-mcmc", "/bedops/bedops", "/arq5x/bedtools2", "/BEETL/BEETL", "/lh3/bfc", "/lh3/bioawk", "/bcgsc/biobloom", "/evoldoers/biomake", "/maasha/biopieces", "/BitSeq/BitSeq", "/PacificBiosciences/blasr", "/flame/blis", "/BenLangmead/bowtie2", "/BenLangmead/bowtie", "", "/ssadedin/bpipe", "/barricklab/breseq", "/lh3/bwa", "/cantera/cantera", "/marbl/canu", "/weizhongli/cdhit", "/infphilo/centrifuge", "/sanger-pathogens/circlator", "/tschaume/ckon", "/xavierdidelot/ClonalFrameML", "/ivazquez/cloneHD", "", "/CSCsw/ColPack", "", "/cusplibrary/cusplibrary", "/marcelm/cutadapt", "/thegenemyers/DALIGNER", "/jeroenjanssens/data-science-toolbox", "/thegenemyers/DAZZ_DB", "/dealii/dealii", "/tobiasrausch/delly", "/thegenemyers/DEXTRACTOR", "/DGtal-team/DGtal", "/tenomoto/dotwrp", "/nh13/DWGSIM", "/DynareTeam/dynare", "", "/elemental/Elemental", "/Ensembl/ensembl-tools", "/aberer/exabayes", "/adarob/eXpress", "/agordon/fastx_toolkit", "/lh3/fermikit", "/lh3/fermi-lite", "/lh3/fermi", "/imageworks/Field3D", "/seqan/flexbar", "/wbhart/flint2", "/ekg/freebayes", "/BoevaLab/FREEC", "/molpopgen/fwdpp", "/GalSim-developers/GalSim", "/broadgsa/gatk-protected", "", "/genometools/genometools", "", "/marbl/gingr", "/arq5x/grabix", "", "", "/sanger-pathogens/gubbins", "/rieck/harry", "/marbl/harvest-tools", "", "/rthurman/hotspot", "/lh3/htsbox", "/samtools/htslib", "/veg/hyphy", "/broadinstitute/IGV", "/igvteam/igv", "git://", "", "", "/ITensor/ITensor", "/sanger-pathogens/iva", "/gmarcais/Jellyfish", "", "git://", "/attractivechaos/k8", "/bioinformatics-centre/kaiju", "/TGAC/KAT", "git://", "/marekkokot/KMC", "/pmelsted/KmerStream", "/bcgsc/kollector", "/DerrickWood/kraken", "", "", "/lastz/lastz", "/libbi/LibBi", "/danfis/libccd", "/y-256/libdivsufsort", "", "", "", "/molpopgen/libsequence", "git://", "/mourisl/Lighter", "/AlgoLab/LightStringGraph")
urls <- c(urls, "/eddelbuettel/littler", "/mgymrek/lobstr-code", "", "/apache/madlib", "", "/mimno/Mallet", "", "/marbl/Mash", "/mfillpot/mathomatic", "/matplotlib/matplotlib", "/voutcn/megahit", "/smithlabcode/methpipe", "/marbl/MHAP", "/ctSkennerton/minced", "/lh3/miniasm", "/lh3/minimap", "/hangelwen/miR-PREFeR", "", "/BhallaLab/moose-core", "/mothur/mothur", "", "", "/jlblancoc/nanoflann", "/jts/nanopolish", "", "/nest/nest-simulator", "", "/lindenb/newicktools", "/tjunier/newick_utils", "/nextflow-io/nextflow", "/G-Node/nix", "/lmrodriguezr/nonpareil", "/bcgsc/ntCard", "/sequencing/NxTrim", "/karel-brinda/ococo", "/ome/ome-common-cpp", "/ome/ome-files-cpp", "/openmicroscopy/bioformats", "/openalpr/openalpr", "/biometrics/openbr", "/OpenImageIO/oiio", "/openmeeg/openmeeg", "/occipital/OpenNI2", "/OpenNI/OpenNI", "/orocos/orocos_kinematics_dynamics", "/davidemms/OrthoFinder", "/gwaldron/osgearth", "/cburstedde/p4est", "/neufeld/pandaseq", "git://", "/marbl/parsnp", "git://", "", "/jonchang/phlawd", "/FePhyFoFum/phyx", "/broadinstitute/pilon", "/chrchang/plink-ng", "/postgres-plr/plr", "", "/arq5x/poretools", "/ariloytynoja/prank-msa", "/hyattpd/Prodigal", "/tseemann/prokka", "/lh3/psmc", "git://", "/esa/pykep", "", "", "/ihh/quaff", "/khowe/quicktree", "/isovic/racon", "/TGAC/RAMPART", "/stamatak/standard-RAxML", "/sebhtml/ray", "/mourisl/Rcorrector", "", "/alexdobin/STAR", "/lh3/ropebwt2", "/lh3/ropebwt", "/rstudio/rstudio", "/rieck/sally", "/COMBINE-lab/salmon", "/lomereiter/sambamba", "/GregoryFaust/samblaster", "", "/rakhimov/scram", "/simongog/sdsl-lite", "/seqan/seqan", "/lh3/seqtk", "/jts/sga", "/najoshi/sickle", "git://", "/SimpleITK/SimpleITK", "", "/SINTEF-Geometry/SISL", "/relipmoc/skewer", "/amplab/snap", "/sanger-pathogens/snp-sites", "/aquaskyline/SOAPdenovo2", "", "/biocore/sortmerna", "/ncbi/sra-tools", "/statismo/statismo", "/gpertea/stringtie", "/torognes/swarm", "/sekika/swrcfit", "/symengine/symengine", "/lh3/tabtk", "/tamarin-prover/tamarin-prover", "/cbcrg/tcoffee", "/rmjarvis/tmv", "/bcgsc/transabyss", "/TransDecoder/TransDecoder", "/Blahah/transrate-tools", "", "/scapella/trimal", "/trinityrnaseq/trinityrnaseq", "git://", "/rrwick/Unicycler/releases", "/sjackman/uniqtag", "/gobics/uproc", "/Victorian-Bioinformatics-Consortium/vague", "/brentp/vcfanno", "/ekg/vcflib", "/vcftools/vcftools", "/Victorian-Bioinformatics-Consortium/VelvetOptimiser", "/dzerbino/velvet", "/ukoethe/vigra", "/TinoDidriksen/cg3", "/torognes/vsearch", "/atks/vt", "/bemoody/wfdb", "/Ensembl/WiggleTools", "/LanguageMachines/wopr", "/herumi/xbyak", "/jimbraun/XCDF", "/gmarcais/yaggo")
formulae <- c("abyss", "ace-corrector", "adam", "adol-c", "afra", "alembic", "alpscore", "anvio", "apophenia", "arb", "arcs", "ascii_plots", "astrometry-net", "ate-pairing", "atram", "bali-phy", "bamhash", "bamm", "bam-readcount", "bamtools", "bamutil", "barrnap", "bcalm", "beagle", "beast2", "beast", "bedops", "bedtools", "beetl", "bfc", "bioawk", "biobloomtools", "biomake", "biopieces", "bitseq", "blasr", "blis", "bowtie2", "bowtie", "bpel2owfn", "bpipe", "breseq", "bwa", "cantera", "canu", "cd-hit", "centrifuge", "circlator", "ckon", "clonalframeml", "clonehd", "coinmp", "colpack", "cp2k", "cusp", "cutadapt", "daligner", "data-science-toolbox", "dazz_db", "dealii", "delly", "dextractor", "dgtal", "dotwrp", "dwgsim", "dynare", "edirect", "elemental", "ensembl-tools", "exabayes", "express", "fastx_toolkit", "fermikit", "fermi-lite", "fermi", "field3d", "flexbar", "flint", "freebayes", "freec", "fwdpp", "galsim", "gatk", "genet", "genometools", "getdp", "gingr", "grabix", "graph-tool", "gtsam", "gubbins", "harry", "harvest-tools", "hmmer", "hotspot", "htsbox", "htslib", "hyphy", "igv", "igvtools", "insighttoolkit", "inter-views", "ipopt", "itensor", "iva", "jellyfish", "jmol", "jsbsim", "k8", "kaiju", "kat", "kent-tools", "kmc", "kmerstream", "kollector", "kraken", "lammps", "last", "lastz", "libbi", "libccd", "libdivsufsort", "libeemd", "libgpkg", "libpll", "libsequence", "libsigrokdecode", "lighter", "lightstringgraph", "littler", "lobstr", "lola", "madlib", "mad-x", "mallet", "mantaflow", "mash", "mathomatic", "matplotlib", "megahit", "methpipe", "mhap", "minced", "miniasm", "minimap", "mir-prefer", "moab", "moose", "mothur", "mrbayes", "mtl", "nanoflann", "nanopolish", "ncbi-c++-toolkit", "nest", "neuron", "newicktools", "newick-utils", "nextflow", "nixio", "nonpareil", "ntcard", "nxtrim", "ococo", "ome-common", "ome-files", "ome-xml", "openalpr", "openbr", "openimageio", "openmeeg", "openni2", "openni", "orocos-kdl", "orthofinder", "osgearth", "p4est", "pandaseq", "paraview", "parsnp", "pastix", "petsc", "phlawd", "phyx", "pilon", "plink2", "plr", "pnapi", "poretools", "prank", "prodigal", "prokka", "psmc", "pulseview", "pykep", "pymol", "qr_mumps", "quaff", "quicktree", "racon", "rampart", "raxml", "ray", "rcorrector", "rml-mmc", "rna-star", "ropebwt2", "ropebwt", "rstudio-server", "sally", "salmon", "sambamba", "samblaster", "sara", "scram", "sdsl-lite", "seqan", "seqtk", "sga", "sickle", "sigrok-cli", "simpleitk", "siril", "sisl", "skewer", "snap-aligner", "snp-sites", "soapdenovo", "sollya", "sortmerna", "sratoolkit", "statismo", "stringtie", "swarm", "swrcfit", "symengine", "tabtk", "tamarin-prover", "t-coffee", "tmv-cpp", "trans-abyss", "transdecoder", "transrate-tools", "trilinos", "trimal", "trinity", "ucsc-genome-browser", "unicycler", "uniqtag", "uproc", "vague", "vcfanno", "vcflib", "vcftools", "velvetoptimiser", "velvet", "vigra", "vislcg3", "vsearch", "vt", "wfdb", "wiggletools", "wopr", "xbyak", "xcdf", "yaggo")

formulae2 <- c ("a5", "abacas", "abinit", "abyss-explorer", "abyss", "acado", "ace-corrector", "adam", "adapterremoval", "adol-c", "afra", "alembic", "alglib", "alien-hunter", "allpaths-lg", "alpscore", "amos", "analysis", "andi", "ann", "anvio", "apophenia", "aragorn", "arb", "arcs", "aribas", "arow++", "arrayfire", "artemis", "art", "ascii_plots", "astral", "astrometry-net", "ate-pairing", "atomic-pseudopotential-engine", "atompaw", "atpdec", "atram", "augustus", "bact", "bali-phy", "bam2wig", "bamhash", "bamm", "bam-readcount", "bamtools", "bamutil", "barrnap", "bayestraits", "bbtools", "bcalm", "bcftools", "beagle", "beast2", "beast", "bedops", "bedtools", "beetl", "bfc", "bioawk", "biobloomtools", "biocgal", "biointerchange", "biomake", "biopieces", "biopp", "bitseq", "blasr", "blast", "blat", "blaze-lib", "bless", "blis", "boost-compute", "bowtie2", "bowtie", "bpel2owfn", "bpipe", "breseq", "busco", "butterflow", "bwa", "bwtdisk", "calculix-ccx", "cantera", "canu", "cap3", "ccfits", "cddlib", "cd-hit", "cdo", "cdsclient", "cegma", "celera-assembler", "centrifuge", "cerulean", "cgns", "circlator", "circos", "ckon", "clark", "clips", "clonalframeml", "clonehd", "clustal-omega", "clustal-w", "cmdstan", "cmor", "coinmp", "colpack", "concorde", "corset", "cp2k", "crfsuite", "crlibm", "cryptoverif", "cuba", "cube", "cufflinks", "cusp", "cutadapt", "cvblob", "cytoscape", "dadadodo", "daligner", "data-science-toolbox", "dazz_db", "dealii", "deeplearning4j-cli", "delly", "des", "dextractor", "dgtal", "diamond", "dida", "discovardenovo", "discovar", "dl_poly_classic", "dotwrp", "ds9", "dsdp", "dsk", "dssp", "dwgsim", "dynare", "ea-utils", "edena", "edirect", "einspline", "elemental", "elph", "emboss", "e-mem", "enblend-enfuse", "ensembl-tools", "ess", "etsf_io", "exabayes", "exonerate", "express", "fann")
formulae2 <- c(formulae2, c("fasta", "fastml", "fastqc", "fastq-tools", "fasttree", "fastuniq", "fastx_toolkit", "fcgene", "fermi2", "fermikit", "fermi-lite", "fermi", "fgsl", "field3d", "flash", "flexbar", "flint", "flux-simulator", "fplll", "fqzcomp", "freebayes", "freec", "fsa", "fwdpp", "g2o", "gaemr", "galfit", "galib", "galsim", "gap", "garli", "gatb", "gatk", "gdcm", "geant4", "geda-gaf", "geneid", "genet", "genewise", "genometools", "getdp", "gfan", "ggobi", "giira", "gingr", "glimmer3", "glimmerhmm", "glpk448", "gmap-gsnap", "gmcloser", "gmtk", "gnuastro", "gnudatalanguage", "grabix", "graphlan", "graph-tool", "gtsam", "gubbins", "h5utils", "harry", "harvest-tools", "hdf4", "healpix", "hisat2", "hisat", "hlaminer", "hmmer2", "hmmer", "hopdm", "hotspot", "htsbox", "htslib", "humann2", "hyphy", "idba", "idcoefs", "igv", "igvtools", "impute2", "infernal", "insighttoolkit", "inter-views", "ipopt", "iqtree", "itensor", "itsol", "iva", "jblas", "jellyfish-1.1", "jellyfish", "jmol", "joinx", "jsbsim", "k8", "kaiju", "kalign", "kallisto", "kat", "kent-tools", "kissplice", "kmacs", "kmc", "kmergenie", "kmerstream", "kollector", "kraken", "lammps", "lapack-manpages", "last", "lastz", "libbigwig", "libbi", "libbuddy", "libccd", "libcerf", "libctl", "libdivsufsort", "libeemd", "libfolia", "libgpkg", "liblbfgs", "libminc", "libpll", "libsbml", "libsbol", "libsequence", "libsigrokdecode", "lie", "lighter", "lightstringgraph", "links-scaffolder", "lis", "littler", "lmfit", "lmod", "lobstr", "lola", "lp_solve", "lrsim", "lsd", "lumpy-sv", "m4ri", "macse", "madlib", "mad-x", "mafft", "maker", "mallet", "mantaflow", "mapsembler2", "maq", "mash", "masurca", "mathgl", "mathomatic", "matplotlib", "maude", "mbsystem", "mcl", "med-file", "megahit"))
formulae2 <- c(formulae2, c("megam", "meme", "meraculous", "metaphlan", "methpipe", "metis4", "mfem", "mfusg", "mhap", "minced", "minia", "miniasm", "minimap", "mira", "mir-prefer", "mitofy", "mlpack", "mlst", "moab", "molden", "moose", "mothur", "mpsolve", "mrbayes", "mrfast", "msieve", "mtl", "multi-worm-tracker", "mummer", "mumps", "muscle", "nanoflann", "nanopolish", "nauty", "ncbi-c++-toolkit", "nccmp", "ncl", "nest", "neuron", "newicktools", "newick-utils", "nextflow", "nexusformat", "nfft", "nglib", "niftilib", "nip2", "nixio", "nonpareil", "novoalign", "ntcard", "numdiff", "nusmv", "nxtrim", "oases", "oce", "ococo", "ogdraw", "oma", "omcompiler", "ome-common", "ome-files", "ome-xml", "openalpr", "openbr", "opencascade", "opencollada", "openfst", "opengrm-ngram", "opengrm-thrax", "openimageio", "openmeeg", "openni2", "openni", "orocos-kdl", "orthofinder", "osgearth", "oswitch", "p4est", "paml", "pandaseq", "parallel-netcdf", "paraview", "parmetis", "parsnp", "pastix", "pathd8", "pathvisio", "paxtools", "pbsuite", "pcap", "pear", "perf", "petsc", "phipack", "phlawd", "phylip", "phyml", "phyutility", "phyx", "picard-tools", "pilercr", "piler", "pilon", "plasma", "platypusvar", "plink2", "plink", "plr", "pnapi", "poa", "pocl", "populations", "poretools", "prank", "primer3", "prodigal", "prokka", "prooftree", "proteinortho", "proverif", "psmc", "pspp", "pulseview", "pykep", "pymol", "qcl", "qr_mumps", "qsopt_ex", "qsopt", "quaff", "quake", "qualimap", "quast", "quest", "quicktree", "quip", "quorum", "r8s", "racon", "radx", "rainbow", "rampart", "rapsearch2", "rate4site", "raxml", "ray", "rcorrector", "readseq", "readsim", "reapr", "recon", "repeatmasker"))
formulae2 <- c(formulae2, c("repeatmodeler", "repeatscout", "rmblast", "rml-mmc", "rnammer", "rna-star", "ropebwt2", "ropebwt", "rstudio-server", "sailfish", "sais", "sally", "salmon", "salt", "sambamba", "samblaster", "samtools@0.1", "samtools", "sara", "sbagen", "scamp", "scarpa", "scotch5", "scotch", "scram", "scrm", "sdsl-lite", "seqan", "seqdb", "seq-gen", "seqtk", "sequel", "sextractor", "sfscode", "sga", "shark", "shogun", "shrimp", "sickle", "sigrok-cli", "silo", "simpleitk", "simulate-pcr", "siril", "sisl", "skewer", "slepc", "slicot", "sllib", "smalt", "smrtanalysis", "snap-aligner", "snap", "snid", "snoscan", "snpeff", "snp-sites", "soapdenovo", "sollya", "soplex", "sortmerna", "spaced", "spades", "spatialite-gis", "spici", "squeezambler", "sratoolkit", "ssake", "stacks", "statismo", "stiff", "stringtie", "sumo", "superlu43", "superlu_dist", "superlu_mt", "swarm", "swetest", "swrcfit", "symengine", "symphony", "tabtk", "tagdust", "tamarin-prover", "tasr", "tbl2asn", "t-coffee", "tetgen", "therion", "ticcutils", "timbl", "tisean", "tmv-cpp", "topcat", "tophat", "trans-abyss", "transdecoder", "transpose", "trans_proteomic_pipeline", "transrate-tools", "transtermhp", "trf", "triangle", "trilinos", "trimadap", "trimal", "trimmomatic", "trinity", "trnascan", "ucsc-genome-browser", "ucto", "unafold", "unicycler", "uniqtag", "uproc", "utgb", "vague", "varscan", "vcake", "vcfanno", "vcflib", "vcftools", "velvetoptimiser", "velvet", "viennarna", "vigra", "vislcg3", "visp", "vsearch", "vt", "wcalc", "wcslib", "wcstools", "weblogo", "wfdb", "wiggletools", "wopr", "xbyak", "xcdf", "xfig", "xmgredit", "xmi-msim", "xraylib", "xrmc", "xylib", "yaggo", "yaha", "yass", "yeppp", "yices", "zoltan"))

fhomepages <- c("", "", "", "", "", "", "/Sheikhizadeh/ACE", "/bigdatagenomics/adam", "/MikkelSchubert/adapterremoval", "", "/EvolBioInf/afra", "", "", "", "", "", "", "/molpopgen/analysis", "/EvolBioInf/andi", "", "", "", "", "", "/bcgsc/arcs", "", "", "", "", "", "/dzerbino/ascii_plots", "/smirarab/ASTRAL", "")
fhomepages <- c(fhomepages, c( "", "", "", "", "/juliema/aTRAM", "", "", "", "", "/DecodeGenetics/BamHash", "", "/genome/bam-readcount", "/pezmaster31/bamtools", "", "/tseemann/barrnap", "", "", "/GATB/bcalm", "", "/beagle-dev/beagle-lib", "", "", "/bedops/bedops", "/arq5x/bedtools2", "/BEETL/BEETL", "/lh3/bfc", "/lh3/bioawk", "", "", "", "/evoldoers/biomake", "/maasha/biopieces", "", "", "/PacificBiosciences/blasr"))
fhomepages <- c(fhomepages, c("", "", "", "", "/flame/blis", "", "", "", "", "/ssadedin/bpipe", "", "", "/dthpham/butterflow", "/lh3/bwa", "", "", "/Cantera/cantera", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "/xavierdidelot/ClonalFrameML", "", "", "", "", "", "", "", "", "/Oshlack/Corset/wiki", "", "", "", "", "", "", "", "", "/marcelm/cutadapt", "", "", "", "/thegenemyers/DALIGNER", "/jeroenjanssens/data-science-toolbox", "/thegenemyers/DAZZ_DB", "", "", "/tobiasrausch/delly", "", "/thegenemyers/DEXTRACTOR", "", "", "", "", "", "", "/tenomoto/dotwrp", "", "", "", "", "/nh13/DWGSIM", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""))
fhomepages <- c(fhomepages, c("", "", "", "", "", "", "", "", "/lh3/fermi2", "/lh3/fermikit", "/lh3/fermi-lite", "/lh3/fermi", "", "", "", "/seqan/flexbar", "", "", "/fplll/fplll", "", "/ekg/freebayes", "", "", "", "", "", "", "", "/GalSim-developers/GalSim", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "/marbl/harvest/blob/master/docs/content/gingr.rst", "", "", "", "", "", "", "", "", "/arq5x/grabix", "", "", "", "/sanger-pathogens/gubbins", "", "", "/marbl/harvest-tools", "", "", "", "", "", "", "", "", "/rthurman/hotspot", "/lh3/htsbox", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "/sanger-pathogens/iva", "", "", "", "", "", "", "/attractivechaos/k8", "", "", "", "/TGAC/KAT", "", "", "", "", "", "/pmelsted/KmerStream", "/bcgsc/kollector", "", "", ""))
fhomepages <- c(fhomepages, c("", "", "/dpryan79/libBigWig", "", "", "", "", "", "/y-256/libdivsufsort", "", "", "", "", "", "", "", "", "", "", "", "/mourisl/Lighter", "", "", "", "", "", "", "", "", "", "/aquaskyline/LRSIM", "/tothuhien/lsd-0.3beta", "/arq5x/lumpy-sv", "", "", "", "", "", "", "", "", "", "", "/marbl/Mash", "", "", "/mfillpot/mathomatic", "", "", "", "", "", "/voutcn/megahit", "", "", "", "", "", "", "", "", "/marbl/MHAP", "/ctSkennerton/minced", "", "/lh3/miniasm", "/lh3/minimap", "", "/hangelwen/miR-PREFeR", "", "", "/tseemann/mlst", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "/jts/nanopolish", "", "", "", "", "", "", "/lindenb/newicktools", "", "", "", "", "", "", "", "", "", "", "/bcgsc/ntCard", "", "", "/sequencing/NxTrim", "", "/tpaviot/oce", "/karel-brinda/ococo", "", "", "", ""))
fhomepages <- c(fhomepages, c("", "", "/openalpr/openalpr", "", "", "", "", "", "", "", "", "", "", "", "/davidemms/OrthoFinder", "", "/yeban/oswitch", "", "", "/neufeld/pandaseq", "", "", "", "/marbl/parsnp", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "/FePhyFoFum/phyx", "", "", "", "/broadinstitute/pilon/wiki", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "/lh3/psmc", "", "", "", "", "", "", "", "", "/ihh/quaff", "", "", "", "", "", "", "", "", "/isovic/racon", "", "", "/TGAC/RAMPART", "", "", "", "", "/mourisl/Rcorrector", "", "", "", "", "", "", "", "", "", "", "/alexdobin/STAR", "/lh3/ropebwt2", "/lh3/ropebwt", "", "", ""))
fhomepages <- c(fhomepages, c("", "/COMBINE-lab/salmon", "", "", "/GregoryFaust/samblaster", "", "", "", "", "", "", "", "", "", "", "/simongog/sdsl-lite", "", "", "", "/lh3/seqtk", "", "", "", "/jts/sga", "", "", "", "/najoshi/sickle", "", "", "", "", "", "", "/relipmoc/skewer", "", "", "", "", "", "", "", "", "", "", "/sanger-pathogens/snp-sites", "", "", "", "", "", "", "", "", "", "/ncbi/sra-tools", "", "", "/statismo/statismo", "", "", "", "", "", "", "/torognes/swarm", "", "", "/symengine/symengine", "", "/lh3/tabtk", "", "", "", "", "", "", "", "", "", "", "/rmjarvis/tmv"))
fhomepages <- c(fhomepages, c("", "", "", "", "", "", "/Blahah/transrate-tools", "", "", "", "", "/lh3/trimadap", "", "", "", "", "", "", "", "/rrwick/Unicycler", "/sjackman/uniqtag", "", "", "", "", "", "/brentp/vcfanno", "/ekg/vcflib", "", "", "", "", "", "", "", "/torognes/vsearch", "", "", "", "", "", "", "/Ensembl/WiggleTools", "", "", "/jimbraun/XCDF", "", "", "/tschoonj/xmimsim", "/tschoonj/xraylib", "/golosio/xrmc", "", "/gmarcais/yaggo", "/GregoryFaust/yaha", "", "", "", ""))

# Number of watchers from 'head' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(urls)) {
  curlExample <- paste("curl -X GET -u tmozgach:YOUPASSWORD",urls[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$subscribers_count))

tibt <- as_data_frame(setNames(list(formulae,vectorr), c("Formula","Watchers")))

# Number of watchers  from 'homepage' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(fhomepages)) {
  curlExample <- paste("curl -X GET -u tmozgach:yourpassword",fhomepages[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$subscribers_count))


tibt2 <- as_data_frame(setNames(list(formulae2,vectorr), c("Formula","Watchers")))

# Combine
full_join <- full_join(tibt2, tibt, by = "Formula")
full_join$Watchers.x[full_join$Watchers.x=="NULL"] <- "0"
full_join$Watchers.y[full_join$Watchers.y=="NULL"] <- "0"

full_join$Watchers <- with(full_join, ifelse (full_join$Watchers.x == "0",full_join$Watchers.y, full_join$Watchers.x))

full_join <- subset(full_join, select = -c(Watchers.x,Watchers.y))

# Number of stars from 'head' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(urls)) {
  curlExample <- paste("curl -X GET -u tmozgach:yourpassword",urls[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$watchers))

tibt <- as_data_frame(setNames(list(formulae,vectorr), c("Formula","Stars")))

# Number of stars  from 'homepage' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(fhomepages)) {
  curlExample <- paste("curl -X GET -u tmozgach:yourpassword",fhomepages[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$watchers))


tibt2 <- as_data_frame(setNames(list(formulae2,vectorr), c("Formula","Stars")))

# Combine
full_join2 <- full_join(tibt2, tibt, by = "Formula")
full_join2$Stars.x[full_join2$Stars.x=="NULL"] <- "0"
full_join2$Stars.y[full_join2$Stars.y=="NULL"] <- "0"

full_join2$Stars <- with(full_join2, ifelse (full_join2$Stars.x == "0",full_join2$Stars.y, full_join2$Stars.x))
full_join2 <- subset(full_join2, select = -c(Stars.x,Stars.y))
# Number of forks from 'head' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(urls)) {
  curlExample <- paste("curl -X GET -u tmozgach:your password",urls[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$forks))

tibt <- as_data_frame(setNames(list(formulae,vectorr), c("Formula","Forks")))

# Number of forks  from 'homepage' URLs
vectorr <- list()
df <- tibble

for (i in 1:length(fhomepages)) {
  curlExample <- paste("curl -X GET -u tmozgach:password",fhomepages[i], sep = "")
  resp <- make_req(straighten(curlExample))
  tJSON <- toJSON(content(resp[[1]](), as="parsed"), pretty=TRUE)
  dJSON <- jsonlite::fromJSON(tJSON, simplifyDataFrame = TRUE)
  vectorr <- c(vectorr, list(dJSON$forks))


tibt2 <- as_data_frame(setNames(list(formulae2,vectorr), c("Formula","Forks")))

# Combine
full_join3 <- full_join(tibt2, tibt, by = "Formula")
full_join3$Forks.x[full_join3$Forks.x=="NULL"] <- "0"
full_join3$Forks.y[full_join3$Forks.y=="NULL"] <- "0"

full_join3$Forks <- with(full_join3, ifelse (full_join3$Forks.x == "0",full_join3$Forks.y, full_join3$Forks.x))
full_join3 <- subset(full_join3, select = -c(Forks.x,Forks.y))

# Combine all tables
full_join4 <- full_join(full_join, full_join2, by = "Formula")
full_join5 <- full_join(full_join3, full_join4, by = "Formula")

# whether the formula is notable?
full_join5$Notable <- with(full_join5, ifelse ((full_join5$Forks >= 20 | full_join5$Watchers >= 20 | full_join5$Stars >= 50), "yes", "no"))

# whether the formula has a  tag "bioinformatics"

full_join5$tag <- with(full_join5, ifelse (full_join5$Formula  %in% bio_tag$X1, "yes", "no"))

# finish the number of Linuxbrew installations in the last year (/home/tmozgacheva/science-20170914.csv), dataframe - Linux_inst
#Delete the options e.g --with bla bla
Linux_inst$`Event Action` <- sub("\\ .*","\\ ",Linux_inst$`Event Action`)
Linux_inst$`Event Action` <- trimws(Linux_inst$`Event Action`)
full_join5$Formula <- trimws(full_join5$Formula)

# combine the number of downloading for the same formula
by_formula_linux <- Linux_inst %>% group_by(`Event Action`) %>% 
summarize(linux_stat = sum(`Total Events`))

by_formula_linux$`Event Action` <- trimws(by_formula_linux$`Event Action`)

# Integrate Linux statistic to the table
full_join5 <- merge(x = full_join5, y = by_formula_linux, by.x = "Formula", by.y = "Event Action", all.x = TRUE)

# Mac statistic
#Delete the options e.g --with bla bla
science_mac$formula <- sub("\\ .*","\\ ",science_mac$formula)
science_mac$formula <- trimws(science_mac$formula)

# combine the number of downloading for the same formula
by_formula_mac <- science_mac %>% group_by(`formula`) %>% 
  summarize(mac_stat = sum(count))

# Integrate Mac statistic to the table
full_join5 <- merge(x = full_join5, y = by_formula_mac, by.x = "Formula", by.y = "formula", all.x = TRUE)

# Sort
full_join5 <- arrange(full_join5,full_join5$Formula)

# Export to TSV file
full_join5 = as.matrix(full_join5)
write.table(full_join5, file='homebrew_science_stat.tsv', quote=FALSE, sep='\t', col.names = NA)
sjackman commented 6 years ago

I'd like up-to-date GitHub data on forks, stars, and watchers. samtools doesn't have any GitHub information because neither its homepage nor head are GitHub urls, but its url is a GitHub URL. Please check all three, homepage, url, and head. Could you please rerun this analysis for Homebrew/science?

tmozgach commented 6 years ago

You are right. My code doesn't deal with such cases. Only with url that has .git in the url. I didn't expect the downloading github link.

url ""

I will work on that! =)

sjackman commented 6 years ago

Thanks, Tanya.