ghmo / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

Reconcile (extension or enhancement) between 2 projects #176

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
It would be useful to perform reconciling between 2 projects themselves.
Currently, cross() function helps, but even better would be to use both ngram() 
and cross() to look at potential matches, For example, say within an address 
field, where 2 projects use different site_ids or site_names but have 
essentially the same address, city, state, zip that is easily discernible as an 
exact match with human eyes, and the only difference is in each project's 
unique identifer, site_id, or site_name, person name whatever.

The missing piece, I think is perhaps wiring up a better interface for 
accomplishing this, via an extension or enhancement.

The interface would allow to display ONLY the fields chosen and needed to apply 
human judgement to reconcile the 2 projects rows/records and use a cross() 
ngram() whatever function to find very close potential matches for the user to 
review.  A new column would be created in both projects that allows to show the 
'synced' or 'matched' row indexes from each corresponding project.  Essentially 
a layman's "matchmaker" interface with only wiring up between comparative 
columns in the 2 projects that the user wishes to inspect, and also the new 
column name that the user wants to use to hold his 'syncs' or 'matches'.

Original issue reported on code.google.com by thadguidry on 3 Nov 2010 at 4:51

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Some similarity here with Issue 90

Original comment by thadguidry on 3 Nov 2010 at 4:54

GoogleCodeExporter commented 8 years ago
I am working on almost the same issue though from a different perspective. 
Thought that it might be helpful to describe it here... I am building an "RDF 
reconciliation service registry": a service that users can submit RDF files to 
and it turns this RDF into a GRefine standard reconciliation service. my next 
step is to build an extension that enables RDF export from a column in the 
project and submit this data to the registry so that other projects can 
reconcile against this data also.

Original comment by fadima...@gmail.com on 7 Nov 2010 at 2:39

GoogleCodeExporter commented 8 years ago
Issue 211 has been merged into this issue.

Original comment by iainsproat on 15 Nov 2010 at 3:21

GoogleCodeExporter commented 8 years ago
Iain, do you think your Issue-96 could also be merged into this ?  or do you 
need other specific functionality ?  Looking at Issue-96 seems like your asking 
for the same thing that I am here?

This Issue-176 also adds the need for a better UI to handle reconciling between 
2 Refine projects.  A high level overview has been given to James Home who will 
conceptualize this for the community for review later.

Original comment by thadguidry on 19 Nov 2010 at 8:53