Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.59k stars 978 forks source link

data.table column disappears right after being created by a merge #5076

Closed fabiocs8 closed 3 years ago

fabiocs8 commented 3 years ago

# Minimal reproducible example

I tried to create a minimal reproducible example with no success. I can forward the code and input files if needed.

Problem: I run the following data.table merge with success, result is as expected:

CMED_mapped <- mapping_key_droga[dt, on = "droga"] str (CMED_mapped) Classes ‘data.table’ and 'data.frame': 49675 obs. of 8 variables: $ key_droga : chr "ALCOOL/ALDEIDEO/AMINOACIDO/FLAVONOL/ AVERMELHADO CETONA/ ... $ N : int 1 1 1 2 1 1 1 1 1 1 ... $ droga_std : chr "EXTRATO DE PROPOLIS+QUIMICA ALCOOL/ALDEIDEO/AMINOACIDO/ ... $ droga : chr "EXTRATO DE PROPOLIS+QUIMICA ALCOOL/ALDEIDEO/AMINOACIDO/... $ dosagem_proc : chr "" "" "" "200MG" ... $ f_f_proc_tex_extensive: chr "SPRAY INFLAMACAO GARGANTA/TOSSE/SINUSITE/FARINGITE/ ... $ forma_apresentacao : chr "FRASCO 50,00 ML" "FRASCO 50,00 ML" "LITRO" "UN" ... $ source : chr "catmat" "catmat" "catmat" "catmat" ... - attr(*, ".internal.selfref")= \<externalptr>

Please note that there are 8 variables.

Then I immediately run the next instruction:

CMED_mapped[, droga_num_char:= nchar(droga_std) ]

And I get the following error:

Error in nchar(droga_std) : object 'droga_std' not found

Repeating the str(): str (CMED_mapped)

Classes ‘data.table’ and 'data.frame': 49675 obs. of 7 variables: $ key_droga : chr "ALCOOL/ALDEIDEO/AMINOACIDO/FLAVONOL/ AVERMELHADO CETONA ... $ N : int 1 1 1 2 1 1 1 1 1 1 ... $ droga : chr "EXTRATO DE PROPOLIS+QUIMICA ALCOOL/ALDEIDEO/AMINOACIDO/ ... $ dosagem_proc : chr "" "" "" "200MG" ... $ f_f_proc_tex_extensive: chr "SPRAY INFLAMACAO GARGANTA/TOSSE/SINUSITE/FARINGITE/... $ forma_apresentacao : chr "FRASCO 50,00 ML" "FRASCO 50,00 ML" "LITRO" "UN" ... $ source : chr "catmat" "catmat" "catmat" "catmat" ... - attr(*, ".internal.selfref")=\<externalptr>

The droga_std column disappeared!

What is going on? I have noticed this behaviour before, it is not the first time.

Please find below the structure of mapping_key_droga and dt.

Please let me know whether I can be of further clarification.

# Output of sessionInfo()

sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: [1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252 LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] stringdist_0.9.6.3 fuzzyjoin_0.1.6 gsubfn_0.7 proto_1.0.0 stringi_1.7.3 stringr_1.4.0 magrittr_2.0.1 data.table_1.14.0

loaded via a namespace (and not attached): [1] tidyselect_1.1.1 R6_2.5.0 rlang_0.4.11 fansi_0.5.0 dplyr_1.0.7 tcltk_4.1.0 tools_4.1.0 parallel_4.1.0 utf8_1.2.1
[10] DBI_1.1.1 ellipsis_0.3.2 assertthat_0.2.1 tibble_3.1.2 lifecycle_1.0.0 crayon_1.4.1 tidyr_1.1.3 purrr_0.3.4 vctrs_0.3.8
[19] glue_1.4.2 compiler_4.1.0 pillar_1.6.1 generics_0.1.0 pkgconfig_2.0.3

str(dt)

Classes ‘data.table’ and 'data.frame': 49675 obs. of 5 variables: $ droga : chr "EXTRATO DE PROPOLIS+QUIMICA ALCOOL/ALDEIDEO ... $ dosagem_proc : chr "" "" "" "200MG" ... $ f_f_proc_tex_extensive: chr "SPRAY INFLAMACAO GARGANTA/TOSSE/SINUSITE ... $ forma_apresentacao : chr "FRASCO 50,00 ML" "FRASCO 50,00 ML" "LITRO" "UN" ... $ source : chr "catmat" "catmat" "catmat" "catmat" ... - attr(*, ".internal.selfref")=\<externalptr>

str(mapping_key_droga)

Classes ‘data.table’ and 'data.frame': 4692 obs. of 4 variables: $ key_droga: chr "ALCOOL ETILICO" "NIMESULIDA" "ANESTESICO ODONTOLOGICO" ... $ N : int 1 1 1 1 1 1 1 2 2 1 ... $ droga_std: chr "ALCOOL ETILICO" "NIMESULIDA" ... $ droga : chr "ALCOOL ETILICO" "NIMESULIDA" ... - attr(*, ".internal.selfref")=\<externalptr>

fabiocs8 commented 3 years ago

I noticed that the problem is solved when I append keepNA = F to the second instruction: CMED_mapped[, droga_num_char:= nchar(droga_std, keepNA = F) ]

So I tried the following minimal reproducible example below, but it does not replicate the error:

data <- c("a", "ab", "abc")
dt <- data.table(data)
dt$data[2] <- NA
dt[, number_chars:=nchar(data) ]
str(dt)
jangorecki commented 3 years ago

Sorry but your report is lacking details for us to even see what could be wrong. Ideally we need copy-paste minimal reproducible example, and avoid loading packages that does not contribute to the problem. Please provide more details so there is something we can proceed with, then issue will be reopened.