ShelterApp / AddResources

http://shelterapp.org/
11 stars 10 forks source link

Scrape NW Hospitality data. #21

Open prabhushrikant opened 3 years ago

prabhushrikant commented 3 years ago

NW hospitality data is available at :

Most recent and updated data set is available here: https://airtable.com/shrkjUwUgqBU8vI1j/tblsz2z5rrnl9fdvG

Have the scraper use airtable apis (https://airtable.com/api) to extract that data into our MongoDB collection.

We can schedule the scraper to look for newer data every week.

We need following information from the table: NW Hospitality: For this dataset, we need to pull summary(name), street(address1), city, state, zip (5 digits only), phone, email(contactEmail), and category can be set as service_summary in mongodb collection.

Scraper should copy the data into tmpNWHospitality collection. Scraper should also compare the data with existing service and tmp* collections for duplicates (using fuzzy search) and copy identified duplicates in tmpNWHospitalityDuplicates collection.