Open MEO265 opened 2 years ago
Well you could always use option use.names = FALSE
, but I guess what you had in mind was that binding with use.names
should work with different encodings.
Exactly, that's what I meant. Otherwise you would also have to permanently consider the order of the columns.
Today we encountered a different way to trigger this bug. Although I am unsure if that isn't a bug in R itself. Anyhow. Let's assume we have two data tables, both have a column names with non-ASCII characters, one is initialised in a call to data.table
, the other is set with setnames
:
dt1 = data.table(Ähm = 1)
dt2 = data.table(Ähm = 1)
setnames(dt2, "Ähm", "Ähm")
Column names have different encodings now (though technically they're both UTF-8):
> colnames(dt1) |> Encoding()
[1] "unknown"
> colnames(dt2) |> Encoding()
[1] "UTF-8"
We get the "expected" result from rbind;
> rbind(dt1, dt2)
Error in rbindlist(l, use.names, fill, idcol) :
Column 1 ['Ähm'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names.
For data.frame's we'd also get an "unknown" encoding, when we initialize a DF the same as dt1
. Lists have the same problem:
> list(Ähm = 1) |> names() |> Encoding()
[1] "unknown"
Perhaps this is related to make.names:
> Encoding("Ähm")
[1] "UTF-8"
> make.names("Ähm") |> Encoding()
[1] "unknown"
As I understand ?Encoding
any character string that contains non-ASCII characters should have an encoding of UTF-8 (when running in a UTF-8 locale), while character strings that only contain ASCII characters should have encoding "unknown" (I don't understand what the benefit would be to set the encoding as "unknown" over UTF-8).
The question would then be if data.table should provide a workaround for this "quirk" in R. IMO it's more an R-problem.
In the following case, the bind does not work:
Output: