ActoKids / AD440_W19_CloudPracticum

3 stars 1 forks source link

Logging style and Nomenclature for crawler team #53

Closed mrvirus9898 closed 5 years ago

mrvirus9898 commented 5 years ago

In this sprint, the crawlers will need to log data. Before we can implement this, we will need to create a consistent style for our logs. This task is for all of us to decide on a style, and to serve as a link to the wiki page we will need to create.

Please provide an answer to the following questions, and feel free to ask more questions as you see fit: When do we need to log? What information should be in the log? What information do we want to log in the event of an error? If we use a time stamp, what format will we use, and what time zone will we use?

mrvirus9898 commented 5 years ago

The crawler should log during the following events: when a crawler successfully connects to a site when a crawler successfully crawls a site when a crawler successfully stores the raw data when a processor successfully loads raw data when a processor successfully processes raw data when a processor successfully stores refined data to the database

MikeJLeon commented 5 years ago

I was thinking just doing something like -Start timer- Crawler - Crawler started Opening page - URL Found link, adding to queue URL's remaining in queue - -num- -repeat- Crawler ended Scrapper started Opening page - URL Scrapping for -event title- finished URL's remaining in queue - -num- -repeat- Scrapper ended Creating JSON output -end time- Process finished in -time-

mrvirus9898 commented 5 years ago

https://github.com/ActoKids/AD440_W19_CloudPracticum/wiki/Logging-style-guide-for-crawling-and-scraping

2 hours spent. Wiki page made to guide devs on how to standardize their logging data.