helios

Scrapy spider to crawl and fetch information about real estate information

This project uses pipenv for package/dependency and virtualenv management, to learn more about how to use pipenv

Requirements:

Setup

workspace

brew install pipenv
pipenv --two
pipenv lock
pipenv install

Setup attached resources

Redis cluster for local dev (Celery):

brew install redis
brew services start redis

Postgres cluster for local dev (Backend database):

brew install postgres
brew services start postgres

Run app

run the following command from root directory

run craiglist bot spider only

cd services/craigslist
scrapy crawl craigbot_all -o craigslist_result.csv

run scheduled bot spider

cd services
export POSTGRES_DB_URI="postgres://postgres@localhost:5432/helios"; python bots.py

run app with bot manual trigger

export FLASK_ENV=development
export POSTGRES_DB_URI="postgres://postgres@localhost:5432/helios"
FLASK_APP=app.py flask run --debugger
# to trigger the bot, run
curl http://localhost:5000

run app with bot autostart

export POSTGRES_DB_URI="postgres://postgres@localhost:5432/helios"
python app.py

Development

Database

local development

# connect to database
psql postgres -U postgres
\c helios
# execute create_initial_schema.sql script

# To test postgres database CRUD operations
cd services/postgres
export POSTGRES_DB_URI="postgres://postgres@localhost:5432/helios"
python postgres.py

Deployment

helios system is set up to be in continous deployment to Heroku platform at https://dashboard.heroku.com/apps/beast-helios tracking master branch

# deploy to Heroku, just check in/ merge into master branch

# restart app
heroku restart

# set/unset environment variables
heroku config:set <env_var>
heroku config:unset <env_var>

# check logs
heroku logs --tail

Note Commits are associated with GPG signing key

PXMYH / helios

readme