dcomtois / summarytools

R Package to Quickly and Neatly Summarize Data
504 stars 77 forks source link

dfSummary: NAs in the data seem to cause error #113

Closed astraetech closed 3 years ago

astraetech commented 4 years ago

Hi, In the larger datasets, NAs in the data seem to cause problems?The only difference between the ok data and the problem data is the size,

> land_imp_pop_1_test %>% dfSummary() %>% view()
Error in lapply(strsplit(x_pad, ""), as.numeric) : 
  (converted from warning) NAs introduced by coercion

Here is the info on the data

> str(land_imp_pop_1_test )
tibble [1,280,986 x 341] (S3: tbl_df/tbl/data.frame)

dfSummary on the smaller subsets seem to work ok. There is no SF data in these which caused problems in my other ticket. Is there any way to debug this? Unfortunately, this is a grant proprietary data and can't share the full data where the error seems to happen.

dcomtois commented 4 years ago

Thx for reporting this... That bit of code is a bit problematic. It's already the main bottleneck in terms of processing time, and it seems it also causes issues with large dataframes. How large is your dataframe (in MB or GB)?

maguiiiar commented 4 years ago

I'm facing the same issue processing the dfSummary function to a dataframe of ~10GB.

dcomtois commented 4 years ago

@maguiiiar @astraetech could you pls confirm that this is still happening with the latest 'dev-current' branch installed using devtools:: or remotes::install_github()?

dcomtois commented 3 years ago

Hi @astraetech and @maguiiiar ,

Did you have the same issue since the latest updates? If not I'll close this ticket. Thx.

dcomtois commented 3 years ago

Closing for now, if new info is provided I'll reopen.