HertieDataScience / SyllabusAndLectures

Hertie School of Governance Introduction to Collaborative Social Science Data Analysis
MIT License
37 stars 60 forks source link

Question regarding duplications #68

Open codykoebnick opened 9 years ago

codykoebnick commented 9 years ago

Hi,

Whilst attempting to remove around 3000 repeated rows within our table (c.3,800 total observations, with c.800 unique observations), we had some real trouble with the duplicate() R command. We since decided on a workaround but really wanted to pursue a solution this way. Here below is the code:

duplicated(total4) newtotal4 <- total4[duplicated(total4)=='FALSE', ]

Although we were able to generate a new ‘newtotal4’ table, the duplicates remained. The main guide we were using for this can be found on p.286 of the R for Dummies book. Grateful for any advice.

mcallaghan commented 9 years ago

You want ones that aren't duplicated - so you want !duplicated(total4) (we just did the same thing)

mcallaghan commented 9 years ago

also might only work vectorwise, rather than rowwise (don't know) but we used !duplicated(dataframe$column)

mcallaghan commented 9 years ago

(and we used dplyr's filter)

christophergandrud commented 9 years ago

Also, checkout FindDups in the DataCombine package.