dcomtois / summarytools

R Package to Quickly and Neatly Summarize Data
504 stars 77 forks source link

detect_barcode warns and converts to error on numeric column #110

Closed seandavi closed 4 years ago

seandavi commented 4 years ago

I have a data.frame that has these data in one of the columns. Running dfSummary on the data.frame results in Error.... converted from warning (see below). Here is a reprex that demonstrates the issue, at least for me. It is a really odd problem that results only when using all 100 values. Subsets do not trigger the warning/error.

y = c(-170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, -170.132, 
  -170.132, -170.132, -170.132, 144.7937, 144.7937, 144.7937, 144.7937, 
  144.7937, 144.7937, 144.7937, 144.7937, 144.7937, 144.7937, 144.7937, 
  144.7937, 144.7937, 144.7937)
summarytools:::detect_barcode(y)
#> Registered S3 method overwritten by 'pryr':
#>   method      from
#>   print.bytes Rcpp
#> Error in lapply(strsplit(x_pad, ""), as.numeric): (converted from warning) NAs introduced by coercion
summarytools:::detect_barcode(y[1:99])
#> [1] FALSE

Created on 2020-04-17 by the reprex package (v0.3.0)

dcomtois commented 4 years ago

Interesting... will look into it as soon as I can

seandavi commented 4 years ago

Thanks. Sorry to not do the work myself and a PR, but also treading water....

On Fri, Apr 17, 2020 at 7:46 PM Dominic Comtois notifications@github.com wrote:

Interesting... will look into it as soon as I can

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dcomtois/summarytools/issues/110#issuecomment-615512821, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAWSE3L4BUE4NP6HYHVVVLRNDS5PANCNFSM4MK2CLYA .

-- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12

dcomtois commented 4 years ago

No worries, I found the problem quickly enough, it had do to with a faulty barcode detection algorithm, where if all numbers had a length of 8 characters it would derail. I corrected and optimized the algorithm, thus improving the overall performance of dfSummary().

You can test it by installing the dev-current version, using install_github from either devtools or remotes packages:

install_github("dcomtois/summarytools", ref="dev-current")

Let me know how it goes... And good luck in these challenging times.

seandavi commented 4 years ago

Thanks, @dcomtois. That was quick. Your change fixes the problem I was seeing.