Closed dmbates closed 10 years ago
It is widespread. When you write the csv files from R you should use the optional argument row.names = FALSE
This is a problem we inherited from the original repo that provided these files. I was hesitant to get out of sync with this repo, but agree that the row index column is annoying. I'll go through and pull them out.
I wrote an R script to dump the data sets from a package using the row names only when they are useful
#!/usr/bin/env Rscript
dump_pkg_datasets <- function(pkg_nms) {
for (pnm in as.character(pkg_nms)) {
if (require(pnm, character=TRUE)) {
pos = paste("package", pnm, sep=":")
dnms = ls(pos=pos)
suppressWarnings(dir.create(pnm))
for (nm in ls(pos=pos)) {
dd = get(nm, pos=pos)
if (is.data.frame(dd)) {
print(nm)
rn = row.names(dd)
use_row_names = !(is.null(rn) || all(rn == 1:nrow(dd)))
write.csv(dd, quote=FALSE,
file=file.path(pnm, paste(nm, "csv", sep=".")),
row.names=use_row_names)
}
}
}
}
}
dump_pkg_datasets(commandArgs(trailingOnly=TRUE))
q("no")
Copy to a file, chmod +x and run it from the shell with the name(s) of one or more packages.
Thanks, Doug. I'll get to this in a bit.
The data set should not have the first (and unnamed) column.
I haven't checked other data sets yet. This may be a widespread "infelicity".