Open nikitsaraf opened 11 years ago
Hi, I'm a 3rd year student of computer science, pursuing my Bachelor's degree at RCC Institute of Information Technology, India. I am interested in learning about this too, having the same doubts as you do. Some clarification of what types of data we are looking at would be very helpful, especially if someone can provide example datasets in csv/xml format to play with and get an idea.
Cheers, Nilesh
@nikitsaraf @nilesh-c
Let me help clarify.
Hi Mick!
Thank you so much for your prompt reply and helping me clarify my doubts.
As, I said before, I have dedupe installed and running on my system. I tried a couple of examples on their sample data and it is fairly easy to use without any complications.
Can you provide me some more details on the use-case of this tool ? Who will utilize this tool (To decide whether to build a Web-Based tool or a python tool itself with a simpler User Interface) ? So, that I can start thinking over the User-Interface and the level of abstraction to be given to this tool.
Also, If you can provide me with some your sample data, I can test it on dedupe, and check whether it can serve our use-cases.
@nikitsaraf,
The use case that we have been talking about is user to run this all in their browser, so they dont even need to install a tool. It should be flexible enough to all for the user to select what columns to match on, provide training, and work through manually matching if needed (this might make more sense as a separate tool)
Dedupe has some sample data you can get started with. But if you want something more advanced I'd suggest grabbing two datasets off https://data.sfgov.org/ (or another city's open data portal) that include address that should match, like all businesses vs restaurant inspection scores
@dthompson I have submitted my proposal. Please review and let me know for any clarifications.
Hello
I am Nikit Saraf, sophomore Computer Science undergraduate at Dhirubhai Ambani Institute of Information and Communication Technology, India.
I was going through the Ideas and found "Automated Data Matching" to be particularly interesting. I have downloaded and Installed dedupe and worked on a couple of examples.
But I don't have clear idea about the project and consequently have some doubts regarding it.
Pardon me, if the above questions seem to be too obvious.