dcomtois / summarytools

R Package to Quickly and Neatly Summarize Data
504 stars 77 forks source link

Error in table(column_data, useNA = "no") : attempt to make a table with >= 2^31 elements > #112

Closed astraetech closed 4 years ago

astraetech commented 4 years ago

Hi, I was wondering if the dfSummary is limited by the size of the data? I have a relatively small dataset:

> str(land_imp_pop_2_ctr_1_fhfa_2_bld_d)
tibble [108,655 x 404] (S3: sf/tbl_df/tbl/data.frame)

and when I run

land_imp_pop_2_ctr_1_fhfa_2_bld_d %>%   dfSummary() %>%
  view(., method ="viewer", file = "land_imp_pop_2_ctr_1_fhfa_2_bld_d_summ.html")

I get this error: Error in table(column_data, useNA = "no") : attempt to make a table with >= 2^31 elements On other datasets with fewer variables and even larger size it works generally ok. It throws this error maybe on 2 out of 6 datasets. Is there any way to debug this? Other than that I like the package a lot. On normal data, 1mln+ obs and 300+ vars, it either gives this error or just stuck for 20-30 mins. Thank you!

Forgot to add, win 10, 128gb of 2733 cl 14 RAM, 1900x so should be fine memory wise.

dcomtois commented 4 years ago

Hi,

Something is going on there, not sure what... is there something unusual about the dataset? You could try running dfSummary using a as.data.frame() version of the dataset. Also, if there are variable labels present, I'd also try removing them. Also, you could try and see if the problem occurs with a smaller subset.

Let me know if this helps!

astraetech commented 4 years ago

I sent you a dropbox link the data to your email. I would really love to understand what is going on with that error.

astraetech commented 4 years ago

I think this was because of the SF data (https://cran.r-project.org/web/packages/sf/index.html) in the dataset. Could you please look into this?

astraetech commented 4 years ago

For now I can remove the SF data from the dataset, and then dfSummary processes it correctly.

dcomtois commented 4 years ago

Yes the problem is with columns that are lists. This is linked to another open issue (https://github.com/dcomtois/summarytools/issues/93) so I will close this one. I'm working on a solution. Thx,

dcomtois commented 4 years ago

@astraetech See issue #93 -- should work now :)