RevolutionAnalytics / checkpoint

Install R packages from snapshots on checkpoint-server
164 stars 38 forks source link

Create metadata catalog for server #16

Closed sckott closed 10 years ago

sckott commented 10 years ago

I'll start working on this locally, then we can move to test on server

sckott commented 10 years ago

sorta-spec:

For each package (many packages wrapped up in a []):

{
  "package": "reshape2",
  "description": {
    "Package": "reshape2",
    "Version": "1.4",
    "Imports": "plyr (>= 1.8.1), stringr, Rcpp",
    "LinkingTo": "Rcpp",
    "Suggests": "testthat (>= 0.8.0), lattice",
    "License": "MIT + file LICENSE",
    "MD5sum": "4b183da64899d2f44714f4b88c18757a",
    "NeedsCompilation": "yes"
  },
  "snapshotId": 123456,
  "snapshotDate": "2014-06-05",
  "snapshotDiffId": "19293838-12312323",
  "compatibitlityCheck": {},
  "source": {
    "1.0": "http://cran.r-project.org/src/contrib/Archive/reshape2/reshape2_1.0.tar.gz",
    "1.1": "http://cran.r-project.org/src/contrib/Archive/reshape2/reshape2_1.1.tar.gz",
    "1.2.1": "http://cran.r-project.org/src/contrib/Archive/reshape2/reshape2_1.2.1.tar.gz",
    "1.2.2": "http://cran.r-project.org/src/contrib/Archive/reshape2/reshape2_1.2.2.tar.gz",
    "1.2": "http://cran.r-project.org/src/contrib/Archive/reshape2/reshape2_1.2.tar.gz"
  },
  "windows": {
    "R3.2": "http://cran.r-project.org/bin/windows/contrib/3.2/reshape2_1.8.1.zip",
    "R3.1": "http://cran.r-project.org/bin/windows/contrib/3.1/reshape2_1.8.1.zip",
    "R3.0": "http://cran.r-project.org/bin/windows/contrib/3/reshape2_1.8.1.zip"
  },
  "osx": {
    "R3.1": "http://cran.r-project.org/bin/macosx/contrib/3.1/reshape2_1.8.1.zip",
    "R3.0": "http://cran.r-project.org/bin/macosx/contrib/3/reshape2_1.8.1.zip",
    "R3.1_mavericks": "http://cran.r-project.org/bin/macosx/mavericks/contrib/3.1/reshape2_1.8.1.tgz"
  }
} 
sckott commented 10 years ago

Addressed here 232a37ea3a3adf55bcd491b98097b2f4330c181c

@revodavid @cmosetick @andrie See the workflow here https://github.com/RevolutionAnalytics/RRT/blob/master/inst/generate_metadata.R for generating metadata. This is all in R. The example only does 3 packages. I imagine with thousands of pkgs this will take quite a while, but if we're only updating 2x/day taking a while will not be a problem I imagine.

sckott commented 10 years ago

Here's what that workflow generates https://gist.github.com/sckott/76713c98b612fddf0a15

andrie commented 10 years ago

Hi, Scott

You'll have to talk me through this process, what it does, and why it does this. For example, the step tools::write_PACKAGES(pkgPath) is rather time-consuming. Since it creates an MD5 hash of the entire package, it takes ~2 seconds per package. The only reason one would want to do this, is to be able to install packages from this location, i.e. R recognises this as a repository. But for this to be the case, the folder structure should be as per the documentation. See the function makeRepo() in miniCRAN at https://github.com/andrie/miniCRAN/blob/master/R/makeRepo.R. Let's discuss.

Andrie

sckott commented 10 years ago

@andrie The tools::write_PACKAGES(pkgPath) bit is to generate the data from the DESCRIPTION file. Is there a faster way to get that information? We don't need to install anything for this, so if there's a faster way let's do that.

The goal of this is to create lightweight set of metadata that we can use to query against within RRT. Part of the process will be to get DESCRIPTION file data from each package, and then add more metadata as you can see in the example

cmosetick commented 10 years ago

@sckott the spec and the gist that you created look great to me with a quick first pass look. Nice job!

andrie commented 10 years ago

It depends exactly what information you are after. I presume it's depends, imports, suggests, etc. All of this information is available in the result of available.packages().

What else do you need?

sckott commented 10 years ago

@andrie Thanks, that will be faster for sure, I changed the workflow to avoid tools:: write_PACKAGES, updated: https://github.com/RevolutionAnalytics/RRT/blob/master/inst/generate_metadata.R

sckott commented 10 years ago

The json file created for the whole CRAN pkg list is a bit heavy, something like 8 mb. Difficult to view, perhaps can show a preview of the file on the marmoset public website http://marmoset.revolutionanalytics.com/metadata/

sckott commented 10 years ago

See

http://marmoset.revolutionanalytics.com/metadata/ http://marmoset.revolutionanalytics.com/metadata/logs/ http://marmoset.revolutionanalytics.com/metadata/logs/2014-06-17_2300/ http://marmoset.revolutionanalytics.com/metadata/logs/2014-06-17_2300/ggplot2.json

sckott commented 10 years ago

Updates to metadata creation file in 4a95492244d682db474848b65b9e03db1eecb01b

sckott commented 10 years ago

the metadata logs may change, but they seem good for now