ActoKids / AD440_W19_CloudPracticum

3 stars 1 forks source link

Website crawler: Outdoors for All #55

Closed mrvirus9898 closed 5 years ago

mrvirus9898 commented 5 years ago

The website crawler in #41 works, however additional work is required. This crawler needs to be broken up into one website for each crawler, in this case https://outdoorsforall.org/events-news/calendar/ is your target URL. Also, logging will need to be implemented in this crawler.

Per @MikeJLeon this crawler also contains a bug. Selenium is having a hard time navigating to the next month of activities from the target URL. As such, we can only scrape one month of activities. Please find a way to get the crawler to pull in data from other months.

Please indicate the time spent on this, any issues that you are having, any good references you found for this subject, and credit anyone helped you out.

daonguyen81 commented 5 years ago

Time estimate: 10 hrs Time spent: +20 hrs Outdoorfallall.org is now on a separate crawler. Fixed the bug! Crawler now can scrawl up to 3 months of events' data. Pull Request Wiki Page

daonguyen81 commented 5 years ago

Added logging to the crawler

daonguyen81 commented 5 years ago

Fix one bug and generated 3 more bugs. Spent another 10 hrs (total 20 hrs) on this crawler fixing bugs.

Errors finally fixed.