IMSMWU / RClickhouse

A 'DBI' Interface to the Yandex Clickhouse Database Providing Basic 'dplyr' Support
GNU General Public License v2.0
91 stars 25 forks source link

error message:count of elements in nested and nulls should be the same #49

Closed giko45 closed 3 years ago

giko45 commented 5 years ago

Hi,

I get the following error:

Error in insert(conn@ptr, qname, value) : count of elements in nested and nulls should be the same

when inserting a data.table with 1 row with some NA values

      id    ds remote_id    dt   nrow      type  name      lastmeasure_at firstmeasure_at
   <dbl> <chr>     <chr> <int>  <int>      <chr> <chr>              <dttm>          <dttm>
 1    NA test    874957    NA 268576       <NA>  <NA> 2019-08-29 22:00:00              NA

All columns that have NA's are Nullable in the table definition.

I do not understand what i need to do to solve this.

Regards

inkrement commented 5 years ago

Thanks for getting in touch. I am not so sure what you problem is. If a column contains NAs then the column in the database handels them as Null values. Therefore, the column is marked as Nullable.

giko45 commented 5 years ago

Hi, Thanks for the response.

Do you have any suggestions on how should i proceed to debug the issue?

               ts <- DBI::dbConnect(RClickhouse::clickhouse(), host='127.0.0.1')
               x= data.frame(id=1,a='asd',b=NA,c=NA,d=NA)
               dbWriteTable(ts, 'test21', x  )

Error in dbDataType(clickhouse(), obj) : Invalid argument to dbDataType: must not be NULL or all NA

seesionInfo() RClickhouse_0.4.1

giko45 commented 5 years ago

some more tests:

Tests with creating table in dbWriteTable: is OK

> dbWriteTable(ts, 'test', data.frame(id=c(1:2),a=c(NA,'ff'))  ,overwrite = T)
OK: │ CREATE TABLE timedb.test (`id` Int32, `a` Nullable(Enum16('ff' = 0))) ENGINE = TinyLog │

Tests pre existing table and updating /adding rows with dbWriteTable fails when one column only has NA

> dbExecute(ts, statement = paste0('DROP TABLE IF EXISTS test'));
> dbExecute(ts, 'CREATE TABLE test (`id` Int32, `a` Nullable(String)) ENGINE = TinyLog ');
> dbWriteTable(ts, 'test', data.frame(id=c(1:2),a=c(NA,'ff'))  ,overwrite = F,append=T)
> dbWriteTable(ts, 'test', data.frame(id=c(1:2),a=c(NA,NA))  ,overwrite = F,append=T)
 Show Traceback

 Rerun with Debug
 Error in insert(conn@ptr, qname, value) : 
  cannot write R type 10 to column of type String 

Update: 
dbWriteTable(ts, 'test', data.frame(id=c(1:2),a=as.character(c(NA,NA)) ) ,overwrite = F,append=T) 
works, so i suppose i do not have an issue any more :)

The last insert of a data.frame with one column with only NA should work when the table already exits since the column type is already known in the table.

Note that

> dbWriteTable(ts, 'test', data.frame(id=c(1:2),a=c(NA,NA))  ,overwrite = T)
 Show Traceback

 Rerun with Debug
 Error in dbDataType(clickhouse(), obj) : 
  Invalid argument to dbDataType: must not be NULL or all NA 

also failes but that's OK since the column type of column cannot be determained

JSchoenbachler commented 3 years ago

@inkrement any update on this? I get the same error whenever attempting to insert a single row with some NA values.

inkrement commented 3 years ago

Hi, I know that the current situation is not really satisfying. But we have to admit that it was not our first priority. Databases such as Clickhouse are built for really big data sets (sometimes so big that you even have to distribute it over multiple machines). In contrast, R is often limited by clear memory restrictions. Therefore, we assumed, that most people use this package to analyse data which already resists in the database and do not use RClickhouse for importing it.

However, sometimes it is quite useful to add small data sets for some in-database operations such as joins. In addition they recently introduced really nice features such as external data sources and we are really eager to look into it. Furthermore, we already planned to revise the interface for creating new tables and inferring data types.

It's always hard to provide a proper time estimate, however, you can expect some enhancements before xmas.

giko45 commented 3 years ago

Note that if you make sure that the column with NA has a datatype it will work

dbWriteTable(ts, 'test', data.frame(id=c(1:2),a=c(NA,NA)) ,overwrite = F,append=T) Failes

dbWriteTable(ts, 'test', data.frame(id=c(1:2),a=as.character(c(NA,NA)) ) ,overwrite = F,append=T) works

tridelt commented 3 years ago

Big thank you to @giko45 for the suggestion of the workaround.

@giko45 @JSchoenbachler After 993d8ae, rch now supports inserting a data.frame of only NA-values into a column of an already created table without explicitly specifying the datatypes.

If the changes don't work for you as intended please feel free to reopen the issue

nachti commented 3 years ago

@giko45 It doesn't work for me as you suggested.

If I create the table first using dbExecute(ts, 'CREATE TABLE test (idInt32,aNullable(String)) ENGINE = TinyLog ') everything is fine, but creating it on the fly is not working: dbWriteTable(ts, 'test', data.frame(id=c(1:2),a=as.character(c(NA,NA)) ) ,overwrite = F,append=T)

I expected dbWriteTable(ts, 'test', data.frame(id = c(1:2), a = c(NA, NA)), overwrite = FALSE, append = TRUE) is not working, because c(NA, NA) is of type logical. You could write c(NA_character_, NA_character_) to tell R, that the NA's should be treated as character here. My task would be to copy data from one DB (Impala) to another one (Clickhouse). Would be fine to keep the scheme. dbDataType(ts, "test") returns a simple string instead of the expected (named) character vector containing the types of the columns (which are the names)

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=de_AT.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=de_AT.UTF-8           LC_COLLATE=de_AT.UTF-8       
 [5] LC_MONETARY=de_AT.UTF-8       LC_MESSAGES=de_AT.UTF-8      
 [7] LC_PAPER=de_AT.UTF-8          LC_NAME=de_AT.UTF-8          
 [9] LC_ADDRESS=de_AT.UTF-8        LC_TELEPHONE=de_AT.UTF-8     
[11] LC_MEASUREMENT=de_AT.UTF-8    LC_IDENTIFICATION=de_AT.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] microbenchmark_1.4-7 RSQLite_2.2.0        nycflights13_1.0.1  
 [4] implyr_0.3.0.9001    RJDBC_0.2-8          rJava_0.9-13        
 [7] dbplyr_2.0.0         dplyr_1.0.2          DBI_1.1.0           
[10] RClickhouse_0.5.2   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5       rstudioapi_0.11  magrittr_1.5     tidyselect_1.1.0
 [5] bit_4.0.4        R6_2.5.0         rlang_0.4.8      fansi_0.4.1     
 [9] blob_1.2.1       tools_4.0.3      utf8_1.1.4       cli_2.1.0       
[13] withr_2.3.0      ellipsis_0.3.1   digest_0.6.27    yaml_2.2.1      
[17] bit64_4.0.5      assertthat_0.2.1 tibble_3.0.4     lifecycle_0.2.0 
[21] crayon_1.3.4     purrr_0.3.4      vctrs_0.3.4      memoise_1.1.0   
[25] glue_1.4.2       compiler_4.0.3   pillar_1.4.6     generics_0.1.0  
[29] pkgconfig_2.0.3