MichaelChirico / r-bugs

A ⚠️read-only⚠️mirror of https://bugs.r-project.org/
20 stars 0 forks source link

[BUGZILLA #17890] transform(df, check.names=FALSE, ...) mangles names of df if it replaces a column of df. If it only adds columns it does not mangle df's names. #7063

Open github-actions[bot] opened 4 years ago

github-actions[bot] commented 4 years ago

I don't think the effect of check.names=FALSE should depend on whether transform() replaces a column in or just adds columns to the data.frame.

df <- data.frame(check.names=FALSE, A-1=11:12, B=21:22) names(df)

[1] "A-1" "B"

names(transform(df, check.names=FALSE, B=B+5, C-3=21:22))

[1] "A.1" "B" "C-3"

names(transform(df, check.names=FALSE, C-3=21:22))

[1] "A-1" "B" "C-3"

sessionInfo()

R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_4.0.2


METADATA

github-actions[bot] commented 4 years ago

Actually, it is only by accident that check.names does anything in a transform() call - it is not a documented argument, so only "works" because transform() internally calls data.frame().

This in turn means that certain variable names are not usable (check.names, fix.empty.names, etc.) because they have a special meaning.

There is an internal split between "new" and "old" variables, specifically

if (any(matched)) {
    `_data`[inx[matched]] <- e[matched]
    `_data` <- data.frame(`_data`)
}

and the call to data.frame() here is not going to see check.names= so will clean up names.

This is very old code and data frame semantics have changed considerably in the meantime. In particular, I don't know whether the call to data.frame is still needed. The code suggests that the previous assignment might have turned _data into something that wasn't a data frame. Similarly, the split between matched and non-matched computed values might not be required anymore, since it now actually works to do things like

airquality[c("Ozone", "foo", "bar")] <- list(Ozone=log(airquality$Ozone), foo=1, bar=2)

with recycling and all.

It might be worth rewriting transform() at some point, but it is an old tool, mainly used interactively, and you are probably better off using within():

names(within(df, {B=B+5; `C-3`=21:22}))

[1] "A-1" "B" "C-3"


METADATA

github-actions[bot] commented 4 years ago

NA


METADATA

github-actions[bot] commented 4 years ago

Yes, within() is nicer about names than transform(). We noticed this issue because at least one CRAN package relied on passing check.names=FALSE to transform:

./rapportools/R/tables.R: tbl <- transform(tbl, % = N / base::sum(N) * 100, check.names = FALSE) # add percentage ./rapportools/R/tables.R: tbl <- transform(tbl, Cumul. N = cumsum(N), Cumul. % = cumsum(%), check.names = FALSE) # add cumulatives


METADATA

github-actions[bot] commented 4 years ago

NA


METADATA