ShelterApp / AddResources

http://shelterapp.org/
11 stars 10 forks source link

Check if Duplicate is already present in tmpDuplicates collection before inserting duplicates #65

Open prabhushrikant opened 3 years ago

prabhushrikant commented 3 years ago

When the scaper is run more than once and duplicates are found, those are blindly added into tmp duplicate collection. Which is causing a lot of duplicates in duplicate collection

e.g. tmpBCFoodBanks has only 100 entries but tmpBCFoodBanksDuplicates collection has 500 entries as result of running the azure function for it over and over again.

We need to check if duplicate already exists in the duplicate collection, if not only them add it. Note: following combination of the fields : (name, address, city, state and zip) should be able to uniquely identify an entry in duplicate collection , if exists we don't need to add it again.