Open drag05 opened 2 years ago
Would you mind sending me the data? I'll probably implement the seq_along
fix if everything else works as intended. I foresee several areas that will need to be fixed to handle unicode characters.
@samFarrellDay I am not proprietary of the data but I could make an artificial set and post it here. So far I have found out that Unicode also impacts the diagnostic plots
.
It would be really useful for documents and Shiny. Otherwise, column names could be changed for the purpose of imputation and then, changed back to Unicode for presentation although working in Unicode throughout would save the overhead.
@samFarrellDay The script below generates a data.table
with missing values and Unicode characters.
One observation: Unicode characters can be converted/visualized only if they are defined inside data.table
environment.
# generate a data table containing NA values
require(data.table)
L = 1000L
x = list(a = sample(c(runif(L, -1L, 1L), rep(NA, L)), L)
, b = sample(c(rnorm(L, 1L, 3L), rep(NA, L %/% 2L)), L)
, c = sample(rep(1:2, each = 2L), L, replace = TRUE))
dt = as.data.table(x)
# convert column "c" to Unicode characters
dt[, c := ifelse(c == 1L, '25 \u03BCL', '50 \u03BCL')]
# rename dt
setnames(dt, c('Treat \u03B1', 'Treat \u03B2', 'Sample'))
> dt
Treat a Treat ß Sample
1: NA NA 50 µL
2: NA NA 50 µL
3: -0.86576094 1.12 50 µL
4: NA NA 50 µL
# obs: names(dt) reads Greek "alpha" ('\u03B1') as Latin character "a"
The script converted "c" vector from list x to Unicode inside data.table
. If I had done this in list x and then converted the list to "data.table", as.data.table
would have not read the characters as Unicode.
Example:
# alternative
# Unicode Greek letters
greek = c('\u03B1', '\u03B2', '\u03B3', '\u03B4', '\u03B5', '\u03B6', '\u03B7', '\u03B8', '\u03B9',
'\u03BA', '\u03BB', '\u03BC', '\u03BD', '\u03BE', '\u03BF', '\u03C0', '\u03C1', '\u03C3',
'\u03C4', '\u03C5', '\u03C6', '\u03C7', '\u03C8', '\u03C9')
# generate list with missing values and Unicode characters
L = 20L
x = list(
a = sample(c(runif(L, -1L, 1L), rep(NA, L)), L)
, b = sample(c(rnorm(L, 1L, 3L), rep(NA, L %/% 2L)), L)
, c = sample(
c(replicate(L, paste0(sample(c(greek, letters, 1:9)
, size = 4L, replace = TRUE), collapse = ''))
, rep(NA, times = L)) , size = L)
)
# convert list to data.table
dt = as.data.table(x)
> dt
a b c
1: 0.706300090 -0.2082637 <NA>
2: NA -1.4747307 <NA>
3: NA NA <U+03B7>o4<U+03C1>9 <--- not read as Greek letters!
4: -0.855431452 -0.8188787 <NA>
5: -0.443747398 2.7301625 <NA>
6: NA NA 2twzz
Thank you!
I have the following data
associated with these
naWhere
,varp
andvarn
Calculating the leftout columns, throws the following error:
Checking
varp
againstcolnames(naWhere)
:It seems to still be working when comparing
varp
againstvarn
:The error seems to be caused by the presence of unicode characters in names although it seems to be no challenge for
varp
andvarn
, as shown by the last code line above. However,using either
seq_along
orbase::enc2native
functions seems to remove the error:Please advise, thank you!