alexsanjoseph / compareDF

R Tool to compare two data.frames
Other
93 stars 17 forks source link

Make Documentation clearer on Grouping Multiple columns to prevent confusion #8

Closed alexsanjoseph closed 6 years ago

alexsanjoseph commented 6 years ago

Brice -

compareDF documentation provided great samples by way of R code and results but the results were not qualified; you should provide some guidance in how to effectively use the compare_df function. The two dataset samples you provided called results_2010 and results_2011 can produce skewed results if additional data is not added to the samples. The general approach in defining Gap analysis results for these datasets would be to use the student name in the group_col argument in the compare_df function. However, when this occurs, the output is grossly skewed. What was realized is that ONLY a unique identifier column of data should be used as the group_col argument. In the sample, student Rohit's name appears 3 times within the datasets but is not the same person as represented in all the records. The point here is that additional guidance should be provided to alert users of the compareDF package to this potential in miscalibrating data.

alexsanjoseph commented 6 years ago

Fixed in 1.5