ActoKids / AD440_W19_CloudPracticum

3 stars 1 forks source link

Finish Google Crawler #71

Open mrvirus9898 opened 5 years ago

mrvirus9898 commented 5 years ago

The Google crawler has grown tremendously, and offers us a lot of functionality and flexibility. However, somethings still need to be resolved. The primary issue involves properly utilizing AWS Secrets to store our Google API key, and token.

Please indicate the time spent on this, any issues that you are having, any good references you found for this subject, and credit anyone helped you out.

rberry206 commented 5 years ago

Estimated Time: 10 hours Total Time Spent So Far: 3 hours

Everything will live locally and we are using a throwaway gmail account so the Secrets Manager feels pretty irrelevant now. I was thinking about using EC2 with Michael but we might not need it for google since it's not using selenium. I'm thinking S3 to Dynamo will be the best way to structure the lambda function.

Reasons for using S3:

rberry206 commented 5 years ago

Estimated Time: 10 hours Total Time Spent: 15 hours

Lambda function 'ad440-w19-lambda-crawler-googlecrawler' now writes to the 'events' table for DynamoDB. The code for S3 functionality exists in the lambda_function.py file but it proved unnecessary for this task. The function pulls all events from the Google Calendar, including recurring events.

Problems I ran into were access to certain services like Lambda, here are those tasks that were resolved: Lambda access for execution #82 Lambda execution time #85

Example of code in action:

Testers:

Who I tested:

Pull Request for Lambda function: https://github.com/ActoKids/web-crawler/pull/18 Link to Lambda: https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ad440-w19-lambda-crawler-googlecrawler?tab=graph Link to pull request: https://github.com/ActoKids/web-crawler/pull/18

daonguyen81 commented 5 years ago

Just ran a test on your googleLambda, it worked great! Logs output and data look great, didn't find any error. Good job Ryan!