What do I do?
Install
Setup
Local Usage
Deployed Usage
More things you should know
How to contribute
This project scrapes Amazon listings for recent reviews of specified products (ASIN) and stores them in a Postgres DB
This project uses pipenv for virtual environments
Navigate to the directory you want to install the project
git clone <project clone url>
pip install pipenv
cd amz-review-scraper
pipenv shell
pipenv sync
This project uses Postgres as it's database of choice. There are 3 environments set up and for each one you will need to setup a Postgres DB. You will need development and testing locally and production for your deployment. Use the credentials from each your new databases to complete the .env file below. My production database is setup on RDS but you could choose to do things differently.
#environment setups
#development
export DEV_SQLALCHEMY_DATABASE_URI= "postgresql://your_development_postgres_url_connection"
export DEV_DEBUG = True
export DEV_LOGIN_BASE_URL="http://127.0.0.1:5000"
#testing
export TESTING_SQLALCHEMY_DATABASE_URI= "postgresql://postgres:@localhost:5432/travis_ci_test"
export TESTING_DEBUG = True
export TESTING_TESTING = True
export TESTING_LOGIN_BASE_URL="http://127.0.0.1:5000"
#production
export PROD_SQLALCHEMY_DATABASE_URI= "postgresql://your_production_postgres_url_connection"
export PROD_LOGIN_BASE_URL="https://your_production_url.com"
#proxy variables
export http="http://proxyservice"
export https="https://proxyservice"
#ensuring UTF-8 to make sure Black works correctly
export LC_ALL=en_US.utf-8
export LANG=en_US.utf-8
#Flask variables
export FLASK_APP=run.py
export FLASK_SECRET_KEY="a_super secret_key"
#Flask-s3 variables for storing static folder in s3 behind cloudfront
export FLASKS3_BUCKET_NAME="static_s3_bucket_name"
export FLASKS3_CDN_DOMAIN="where_s3bucket_is_on.cloudfront.net"
#aws credentials
export AWS_ACCESS_KEY_ID ="aws_secret_access_key_id"
export AWS_SECRET_ACCESS_KEY ="aws_secret_access_key"
To initialize the tables in your database use:
python create_db.py
This will create the DB from scratch to use when you run the app.
flask run
Input an ASIN and the corresponding Amazon listing will be scraped and its Review Data added to Postgres DB
pytest
In amz_review_scraper/config.py there are 2 variables:
If you would like to switch on the output just change one or the other or both to = "y"
After you install new packages use this code to lock the Pipfile.lock
pipenv lock --pre
This is because Black is a pre-release. If you decide to remove Black as the linter of choice you will not have to do this when installing any new packages.
Make sure you have your AWS credentials loaded using
awscli
You may want to set this up outside the environment. For help setting this up see the awscli documentation
zappa init
Edit your zappa_settings.json file to add the exclude setting (shown below), since we are serving the static files from cloudfront/s3 bucket. Also add you environment variables here as the .env file will only help you locally so those same variables need to be in zappa_settings.json as well so that your Lambda function will have access to environment variables in the cloud.
{
"dev": {
"app_function": "run.app",
"aws_region": "us-east-2",
"profile_name": "default",
"project_name": "amz-review-scra",
"runtime" : "python3.6",
"s3_bucket": "zappa-123443-zappa",
"exclude": ["static", "test"],
"aws_environment_variables" :
{
"FLASK_APP": "run.py",
"FLASK_SECRET_KEY": "your_flask_secret_key",
"FLASKS3_BUCKET_NAME": "static_s3_bucket_name",
"FLASKS3_CDN_DOMAIN": "where_s3bucket_is_on.cloudfront.net",
"http": "http://your_proxyservice",
"https": "https://your_proxyservice",
"DEV_SQLALCHEMY_DATABASE_URI": "postgresql://your_development_postgres_url_connection",
"DEV_DEBUG" : "True",
"DEV_LOGIN_BASE_URL":"http://127.0.0.1:5000",
"TESTING_SQLALCHEMY_DATABASE_URI": "postgresql://postgres:@localhost:5432/travis_ci_test",
"TESTING_DEBUG" : "True",
"TESTING_TESTING" : "True",
"TESTING_LOGIN_BASE_URL":"http://127.0.0.1:5000",
"PROD_SQLALCHEMY_DATABASE_URI": "postgresql://your_production_postgres_url_connection",
"PROD_LOGIN_BASE_URL":"https://your_custom_url.com",
"ZAPPA": "True",
},
"use_precompiled_packages": true,
"cors": true,
"binary_support": false,
}
}
zappa deploy
You can also initialize the DB using Zappa directly from the lambda function using: (tip don't forget the quotes):
zappa invoke <stage name> "create_db.db_init"
In this project I am learning and practicing a number of skills, if you would like to comment on my code in places I could write better code, it would be much appreciated.
The project uses Travis CI to automate testing and Python Black for Formatting and Automatic Format Checking