Closed mrvirus9898 closed 5 years ago
The crawler should log during the following events: when a crawler successfully connects to a site when a crawler successfully crawls a site when a crawler successfully stores the raw data when a processor successfully loads raw data when a processor successfully processes raw data when a processor successfully stores refined data to the database
I was thinking just doing something like -Start timer- Crawler - Crawler started Opening page - URL Found link, adding to queue URL's remaining in queue - -num- -repeat- Crawler ended Scrapper started Opening page - URL Scrapping for -event title- finished URL's remaining in queue - -num- -repeat- Scrapper ended Creating JSON output -end time- Process finished in -time-
2 hours spent. Wiki page made to guide devs on how to standardize their logging data.
In this sprint, the crawlers will need to log data. Before we can implement this, we will need to create a consistent style for our logs. This task is for all of us to decide on a style, and to serve as a link to the wiki page we will need to create.
Please provide an answer to the following questions, and feel free to ask more questions as you see fit: When do we need to log? What information should be in the log? What information do we want to log in the event of an error? If we use a time stamp, what format will we use, and what time zone will we use?