Open jams2018 opened 5 years ago
Talked to the crawler team and learned that they were looking into the tool, Glue, in AWS. I am in the process of obtaining access to an AWS account, administered by Toddy.
I've had access to the AWS account and explored Glue, S3, and Lambda in AWS.
Outstanding, will you have time to talk about this with the Crawler Crew on Wednesday?
Nick, in the cafeteria. When the team has time, we'll meet and discuss ways to deploy the crawler.
Created test storage in AWS and a Glue table for the crawler.
The crawler team released a crawler script that produces a JSON file. It's pushed to Github and ready to be tested in AWS.
Integrated the project into Travis CI and ran the first test.
Resources Reviewed: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-flask.html https://docs.aws.amazon.com/systems-manager/latest/userguide/integration-remote-scripts.html https://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html Need: Deploy Python crawler to AWS as Lambda function to be triggered. Estimated Time: 4 hours
Deliverable for end of sprint: Create the pipeline for s3 deployment of JSON file from crawler Estimated time: 5-7 hours, Looking into Automating the kickoff job for the crawler. Looking into automating moving the JSON file into an s3 bucket on a timed basis.
Merged the crawler team's browser and facebook crawlers to the master branch and tested the scripts on Travis CI (screenshot: https://github.com/ActoKids/devops/blob/dev/scripts/travis_ci_deployment_crawlers.png)
The scripts pass the test.
Source Reviewed: https://vevurka.github.io/dsp17/git/quality/django/python/travis_ci_frisor/ https://docs.travis-ci.com/user/tutorial/ https://docs.travis-ci.com/user/languages/python/
Actual Time Spent: 4 hours
Deployed ec2 instance that executes the WebCrawler and dumps contents to an s3: http://ad440.s3-website-us-west-2.amazonaws.com/
Merged and tested the crawler team script that crawls the websites, facebook, and google calendar.
Travis CI Link: https://travis-ci.org/ActoKids/web-crawler/builds/483694868
The objective is to do research on how to deploy a crawler script by the Crawler team in AWS and do a unit test in Travis CI.
Actual time spent so far: 10 hours Estimated completion time: 7 hours