Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 985 forks source link

Allow setnames to include old columns not in x #1099

Closed MaxGhenis closed 9 years ago

MaxGhenis commented 9 years ago

A allow.absent.cols option for setnames would facilitate cases when the user wants to apply a column mapping without necessarily knowing whether all columns will exist. I'm currently using the following workaround, from my SO self-answer:

Setnames <- function(x, old, new, allow.absent.cols=F) {
  if (!allow.absent.cols) {
    setnames(x, old, new)
  } else {
    old.intersect <- intersect(old, names(x))
    common.indices <- old %in% old.intersect
    new.intersect <- new[common.indices]
    setnames(x, old.intersect, new.intersect)
  }
}
eantonya commented 9 years ago

I'm not convinced this is a good idea, and if implemented it would need to come with a plethora of warnings.

arunsrinivasan commented 9 years ago

This doesn't seem like a good idea to me as well. If you need to, then you can always use column numbers instead of names.

Unless there's a strong reasonable explanation for this, we should close this.

MaxGhenis commented 9 years ago

My use case involves looping through lots of raw datasets, and applying a single column name mapping to clean each in a consistent way. Something like

l <- list(...)  # List of raw data.tables, which don't all have the same set of columns
raw.names <- c(...)  # Names showing up in raw data
clean.names <- c(...)  # Clean names
for (dt in l) setnames(dt, raw.names, clean.names)

Column numbers wouldn't work for me. I agree a warning for each skipped column makes sense.

arunsrinivasan commented 9 years ago

You just have to add 1 more line...

for (dt in l) {
    ix = match(names(dt), raw.names, 0L)
    setnames(dt, raw.names[ix], clean.names[ix])
}
MaxGhenis commented 9 years ago

Thanks, that's cleaner than my intersect / %in% approach above. Listed as an answer on the SO question. I'm fine closing if this need is unusual.

arunsrinivasan commented 9 years ago

Max, glad that helped. :+1:

MaxGhenis commented 4 years ago

This is now implemented directly as setnames(..., skip_absent=TRUE).

MichaelChirico commented 4 years ago

Closed in https://github.com/Rdatatable/data.table/pull/3111, the other issue was #3030