Have the scraper use airtable apis (https://airtable.com/api) to extract that data into our MongoDB collection.
If there is download api to download whole into a csv use that.
If not , we need a shelterapp login which pipeline can use.
We can schedule the scraper to look for newer data every week.
We need following information from the table:
NW Hospitality: For this dataset, we need to pull summary(name), street(address1), city, state, zip (5 digits only), phone, email(contactEmail), and category can be set as service_summary in mongodb collection.
Scraper should copy the data into tmpNWHospitality collection.
Scraper should also compare the data with existing service and tmp* collections for duplicates (using fuzzy search) and copy identified duplicates in tmpNWHospitalityDuplicates collection.
NW hospitality data is available at :
Most recent and updated data set is available here: https://airtable.com/shrkjUwUgqBU8vI1j/tblsz2z5rrnl9fdvG
Have the scraper use airtable apis (https://airtable.com/api) to extract that data into our MongoDB collection.
We can schedule the scraper to look for newer data every week.
We need following information from the table: NW Hospitality: For this dataset, we need to pull summary(name), street(address1), city, state, zip (5 digits only), phone, email(contactEmail), and category can be set as service_summary in mongodb collection.
Scraper should copy the data into
tmpNWHospitality
collection. Scraper should also compare the data with existingservice
andtmp*
collections for duplicates (using fuzzy search) and copy identified duplicates intmpNWHospitalityDuplicates
collection.