ActoKids / AD440_W19_CloudPracticum

3 stars 1 forks source link

Research deploying a crawler in AWS and Travis CI #22

Open jams2018 opened 5 years ago

jams2018 commented 5 years ago

The objective is to do research on how to deploy a crawler script by the Crawler team in AWS and do a unit test in Travis CI.

Actual time spent so far: 10 hours Estimated completion time: 7 hours

jams2018 commented 5 years ago

Talked to the crawler team and learned that they were looking into the tool, Glue, in AWS. I am in the process of obtaining access to an AWS account, administered by Toddy.

jams2018 commented 5 years ago

I've had access to the AWS account and explored Glue, S3, and Lambda in AWS.

mrvirus9898 commented 5 years ago

Outstanding, will you have time to talk about this with the Crawler Crew on Wednesday?

jams2018 commented 5 years ago

Nick, in the cafeteria. When the team has time, we'll meet and discuss ways to deploy the crawler.

Created test storage in AWS and a Glue table for the crawler.

jams2018 commented 5 years ago

The crawler team released a crawler script that produces a JSON file. It's pushed to Github and ready to be tested in AWS.

jams2018 commented 5 years ago

Integrated the project into Travis CI and ran the first test.

LyndonP commented 5 years ago

Resources Reviewed: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-flask.html https://docs.aws.amazon.com/systems-manager/latest/userguide/integration-remote-scripts.html https://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html Need: Deploy Python crawler to AWS as Lambda function to be triggered. Estimated Time: 4 hours

Deliverable for end of sprint: Create the pipeline for s3 deployment of JSON file from crawler Estimated time: 5-7 hours, Looking into Automating the kickoff job for the crawler. Looking into automating moving the JSON file into an s3 bucket on a timed basis.

jams2018 commented 5 years ago

Merged the crawler team's browser and facebook crawlers to the master branch and tested the scripts on Travis CI (screenshot: https://github.com/ActoKids/devops/blob/dev/scripts/travis_ci_deployment_crawlers.png)

The scripts pass the test.

Source Reviewed: https://vevurka.github.io/dsp17/git/quality/django/python/travis_ci_frisor/ https://docs.travis-ci.com/user/tutorial/ https://docs.travis-ci.com/user/languages/python/

Actual Time Spent: 4 hours

LyndonP commented 5 years ago

Deployed ec2 instance that executes the WebCrawler and dumps contents to an s3: http://ad440.s3-website-us-west-2.amazonaws.com/

jams2018 commented 5 years ago

Merged and tested the crawler team script that crawls the websites, facebook, and google calendar.

Travis CI Link: https://travis-ci.org/ActoKids/web-crawler/builds/483694868

jams2018 commented 5 years ago

travis_ci_deployment_crawlers_sprint1