Closed sckott closed 10 years ago
sorta-spec:
For each package (many packages wrapped up in a []
):
{
"package": "reshape2",
"description": {
"Package": "reshape2",
"Version": "1.4",
"Imports": "plyr (>= 1.8.1), stringr, Rcpp",
"LinkingTo": "Rcpp",
"Suggests": "testthat (>= 0.8.0), lattice",
"License": "MIT + file LICENSE",
"MD5sum": "4b183da64899d2f44714f4b88c18757a",
"NeedsCompilation": "yes"
},
"snapshotId": 123456,
"snapshotDate": "2014-06-05",
"snapshotDiffId": "19293838-12312323",
"compatibitlityCheck": {},
"source": {
"1.0": "http://cran.r-project.org/src/contrib/Archive/reshape2/reshape2_1.0.tar.gz",
"1.1": "http://cran.r-project.org/src/contrib/Archive/reshape2/reshape2_1.1.tar.gz",
"1.2.1": "http://cran.r-project.org/src/contrib/Archive/reshape2/reshape2_1.2.1.tar.gz",
"1.2.2": "http://cran.r-project.org/src/contrib/Archive/reshape2/reshape2_1.2.2.tar.gz",
"1.2": "http://cran.r-project.org/src/contrib/Archive/reshape2/reshape2_1.2.tar.gz"
},
"windows": {
"R3.2": "http://cran.r-project.org/bin/windows/contrib/3.2/reshape2_1.8.1.zip",
"R3.1": "http://cran.r-project.org/bin/windows/contrib/3.1/reshape2_1.8.1.zip",
"R3.0": "http://cran.r-project.org/bin/windows/contrib/3/reshape2_1.8.1.zip"
},
"osx": {
"R3.1": "http://cran.r-project.org/bin/macosx/contrib/3.1/reshape2_1.8.1.zip",
"R3.0": "http://cran.r-project.org/bin/macosx/contrib/3/reshape2_1.8.1.zip",
"R3.1_mavericks": "http://cran.r-project.org/bin/macosx/mavericks/contrib/3.1/reshape2_1.8.1.tgz"
}
}
Addressed here 232a37ea3a3adf55bcd491b98097b2f4330c181c
@revodavid @cmosetick @andrie See the workflow here https://github.com/RevolutionAnalytics/RRT/blob/master/inst/generate_metadata.R for generating metadata. This is all in R. The example only does 3 packages. I imagine with thousands of pkgs this will take quite a while, but if we're only updating 2x/day taking a while will not be a problem I imagine.
Here's what that workflow generates https://gist.github.com/sckott/76713c98b612fddf0a15
Hi, Scott
You'll have to talk me through this process, what it does, and why it does this. For example, the step tools::write_PACKAGES(pkgPath)
is rather time-consuming. Since it creates an MD5 hash of the entire package, it takes ~2 seconds per package. The only reason one would want to do this, is to be able to install packages from this location, i.e. R recognises this as a repository. But for this to be the case, the folder structure should be as per the documentation. See the function makeRepo()
in miniCRAN
at https://github.com/andrie/miniCRAN/blob/master/R/makeRepo.R. Let's discuss.
Andrie
@andrie The tools::write_PACKAGES(pkgPath)
bit is to generate the data from the DESCRIPTION
file. Is there a faster way to get that information? We don't need to install anything for this, so if there's a faster way let's do that.
The goal of this is to create lightweight set of metadata that we can use to query against within RRT
. Part of the process will be to get DESCRIPTION
file data from each package, and then add more metadata as you can see in the example
@sckott the spec and the gist that you created look great to me with a quick first pass look. Nice job!
It depends exactly what information you are after. I presume it's depends, imports, suggests, etc. All of this information is available in the result of available.packages().
What else do you need?
@andrie Thanks, that will be faster for sure, I changed the workflow to avoid tools:: write_PACKAGES
, updated: https://github.com/RevolutionAnalytics/RRT/blob/master/inst/generate_metadata.R
metadata
The json file created for the whole CRAN pkg list is a bit heavy, something like 8 mb. Difficult to view, perhaps can show a preview of the file on the marmoset public website http://marmoset.revolutionanalytics.com/metadata/
Updates to metadata creation file in 4a95492244d682db474848b65b9e03db1eecb01b
the metadata logs may change, but they seem good for now
I'll start working on this locally, then we can move to test on server
tools::write_PACKAGES
to create metadata for each packageNULL
for now