ActoKids / AD440_W19_CloudPracticum

3 stars 1 forks source link

Google Crawler and Logging #54

Open mrvirus9898 opened 5 years ago

mrvirus9898 commented 5 years ago

The Google crawler in #42 works, but could use additional work. The enhancement, Logging, needs to be implemented. Aggregating the calendars may remove some necessary data, this is a bug.

Please indicate the time spent on this, any issues that you are having, any good references you found for this subject, and credit anyone helped you out.

rberry206 commented 5 years ago

Estimated time: 8 hours Time spent so far: 2 hours

Aggregating calendars is indeed removal necessary data. I've tried to find a workaround but there is nothing online so we may need to call all of the calendars separately. In theory it shouldn't take too much more time but it is a bummer that we can't aggregate them. I'm working with calls to structure unstructured data right now, like searching for a phone number or email within the description. I don't think we'll ever glean as much information from a Google Calendar as a website. We will certainly fill event title, date, time, email, and location. Some list cost. I'm working to see what else we can pull from the calendar.

rberry206 commented 5 years ago

Estimated time: 8 hours Time spent: 9 hours

I first found out how to aggregate calendars without removing necessary data. You can call individual calendars you've added to your own by using the email associated with the added calendar. I used Google's API to see how much data I could cleanly log. I found email, event summary, event title, event date, start time, end time and URL to be consistently working. If the location data was null I had to throw an error, and defaulted the location to null in that case.

Updated nomenclature to fit with the rest of the team. Since the calendar isn't calling into a specific URL like the others a lot of the startup messages seemed redundant. I added a success message if the calendar found was not empty. The rest of the nomenclature should be consistent with the other teams.

I created a secret for the credentials.json and token.pickle files that must exist to make the code run. These can be found in the Secrets Manager in 'Google-Calendar-Tokens'.

Example of code in action:

Testers: Michael - Code should work but is receiving unicode errors from the newer version of Python. The output is correct on your computer. Tyler - Does what it's supposed to do. Both approved my pull request. To solve Michael's issue is a problem with the tokens, we just need to start the program from scratch on his PC and copy the code over. Google API is strange about these tokens, I've personally never used a pickle file.

Who I tested/approved: I tested Michael's code. It worked well but had a few unnecessary parts like a continuous timer for each event. They won't exist in the lambda function. His code is very readable.

Next to do:

Link to Wiki: https://github.com/ActoKids/web-crawler/wiki/Google-Calendar-Crawler Link to code: https://github.com/ActoKids/web-crawler/tree/dev/scripts/googleCalendarCrawler Link to pull request: https://github.com/ActoKids/web-crawler/pull/11

mrvirus9898 commented 5 years ago

Dope GIF