Closed GoogleCodeExporter closed 8 years ago
I think I may have found my own answer:
http://groups.google.com/group/google-refine/browse_thread/thread/18e64a3e4eedfb
09/dc94dbc0d3106441?lnk=gst&q=cluster#dc94dbc0d3106441
I was thinking refine could do this but I'm probably using it as a golden
hammer for this.
Original comment by matt.mac...@gmail.com
on 15 Mar 2011 at 12:41
Simple ruby script solved my problem
require 'rubygems'
require 'csv'
# Create the output file
CSV.open("courts-deduped.csv", "wb") do |csv|
deduped_courts = Hash.new
CSV.foreach("tennis-courts.csv") do |row|
deduped_courts[row[1]] = row
end
deduped_courts.each do |key, value|
csv << value
end
end
Original comment by matt.mac...@gmail.com
on 15 Mar 2011 at 12:59
Refine now has a separate facet which can be used for identical duplicates.
Original comment by tfmorris
on 8 Oct 2011 at 7:24
Original comment by dfhu...@google.com
on 9 Oct 2011 at 5:30
This was added by the patch in issue 398 and appeared in Refine 2.1.
Original comment by tfmorris
on 12 Dec 2011 at 8:23
Original issue reported on code.google.com by
matt.mac...@gmail.com
on 15 Mar 2011 at 12:00