Website Crawler: Round two

mrvirus9898 commented 5 years ago

The website crawler in #41 works, however additional work is required. This crawler needs to be broken up into one website for each crawler; you mentioned the other URL in class but I forgot it. Also, logging will need to be implemented in this crawler.

Please indicate the time spent on this, any issues that you are having, any good references you found for this subject, and credit anyone helped you out.

MikeJLeon commented 5 years ago

Estimated time: : 10 hours Have split my modules already. Also split crawler from scraper for my portion, will need to tell Dao to do the same.

MikeJLeon commented 5 years ago

I may have overshot my estimate time way too high. It took me about an hour to separate the code and hand it off to Dao. Deciding on logging culture was done in class and is very simple to implement. This is next on my agenda.

MikeJLeon commented 5 years ago

Estimated time : 10 hours

As mentioned earlier I finished this much quicker than I had anticipated. It was pretty easy, As requested I have separated my OFAcrawler from my browser crawler. I sat with the team to discuss what should be logged while the scripts run and we created the nomenclature (See Nick's ticket on this matter).

In addition I decided to split crawlers and scrapers as it is two separate tasks to specialize scripts better. I then changed the naming convention of "browserCrawler" to "EBCrawler" and "EBScraper" with all the logging changes. Please find below the actual time spent, pull request, wiki pages, testing feedback, and my own testing feedback.

Actual time : 2 hours Pull request : https://github.com/ActoKids/web-crawler/pull/13 Wiki: https://github.com/ActoKids/web-crawler/wiki/Event-Brite-Crawler Wiki : https://github.com/ActoKids/web-crawler/wiki/Event-Brite-Scraper Testers : @rberry206 and @daonguyen81 Feedback : Ryan - Content is readable and easy to use. Works well for me. Dao - Logging logic output to terminal is easy to understand how and where we at with the scrapping. I suggest Mike to add empty line between each link for more readable.

What I tested - @rberry206 - https://github.com/ActoKids/web-crawler/pull/11 "Seems to work, only caveat is the requirement of the credentials.json. we'll need to look for a solution to that, which i'm sure secrets manager on aws will handle." I made note to mention that the credentials.json file may make things complicated and made suggestions to incorporate it in secrets manager when we are at that stage. Outside of that his code ran as expected and displayed events. @daonguyen81 - https://github.com/ActoKids/web-crawler/pull/14 Dao's crawler seems to not be working as of the time of this comment. I plan on working with him to sort out the error's hes getting as the crawler was originally mine.

ActoKids / AD440_W19_CloudPracticum

Website Crawler: Round two #56