dcomtois / summarytools

R Package to Quickly and Neatly Summarize Data
522 stars 78 forks source link

Suggestion to add number of unique rows #37

Closed paulfeitsma closed 6 years ago

paulfeitsma commented 6 years ago

The current version of the Data Frame Summary shows the number of rows. In many cases it is very usefull to know how many unique rows there are. For example the iris dataset contains 150 rows, but there is one duplicate row (e.g. nrow(unique(iris)) gives 149). It would be very helpfull to add this to the top of the report.

dcomtois commented 6 years ago

It works well, you coded it right. (Congrats!) I'm just thinking of the different options... Should this appear only when there are duplicate cases for instance... And also, is it more suitable to report duplicates, as opposed to distinct rows...? Should there be an additionnal parameter to turn this on/off, and so on... Let me know your thought!

paulfeitsma commented 6 years ago

I believe it is both useful information when there are only unique rows (no duplicates) and when there are duplicates. So I believe in both cases you should show this. I also thought about showing the number of unique rows, but I believe it is more easy to interpret the number of duplicates (0 = good in most cases). I don't believe we should introduce a parameter for this, maybe in the future when someone suggests that.

dcomtois commented 6 years ago

You make a good case! Merging into the dev-current version. Thanks again