NRGI / ResourceProjects.org

MIT License
4 stars 0 forks source link

Entity Reconciliation #113

Open byndcivilization opened 8 years ago

byndcivilization commented 8 years ago

We need a deduplication process to identify possible matches in incoming data.

Solution #1: simple fuzzy matching algorithm that attempts to match on one or a few fields. (name and some additional info)

Solution #2: ML assisted entity reconciliation process. This would use ML methodology to derive a matching score to identify possible duplicates. There would then need to be a UI to either merge the matched entities (show matches and partial matches, i.e exact match of first word, for both project names and company names. Allow user to confirm all or some of the matches for example) or to at least display possible links on an entity page.

Moved from #6

mattfullerton commented 8 years ago

Leaving this open for now, but once we are done with the initial deduplification and reconciliation processes and have some experience with them we can decide whether we need to make improvements (ML, etc.) Assigning @davidmihalyi to be arbitrator of whether the process is working well.

mattfullerton commented 8 years ago

See #147 for a nice use case of how entities need to/should be merged

davidmihalyi commented 7 years ago

We have a good first workflow for this. Further improvement will require more thinking in a next phase.