Old notes dump - Githubissues

From previous projects
* Data Topics
** Incoming
*** projects
**** metrics lib
***** timing metrics
****** Timing("some.key.thing") { body }
**** graphite, grafana deployable
***** store dashboards
**** logstash, kibana, elasticsearch
**** http layer
***** store results in s3 and postgres
***** batches req/resp
***** non blocking
***** hash http request / response ids when stored as s3 keys
**** s3 interface
**** extraction layer(s)
***** xpath, regex
***** email, phone numbers, addresses, names, etc
***** postgres tables for scraped data, (scraped_emails)
***** cleaner tables for post-processed / reconciled data
****** graph problem? to find connected data
**** actors lib
***** w/ common patterns
**** pre-populating url layer
**** api for viewing some basic stats / search?
***** will need SSL in front of it, if it's public in any way.
**** encryption lib
***** health check on app start to ensure we can crypt
**** compression lib
**** basebox setup for aws nodes
**** aws nodes, RDS setup
***** put it all inside a VPC?
**** deploy project
***** w/ scripts to setup and start dbs and apps
***** check that the correct number are running
***** holds rsa keys etc
***** holds "production.conf" that are primary config for each app.
**** shared logging volumes
**** private docker hub
*** http://www.whitehouse.gov/open
** People
*** Names (name_id)
*** Aliases (alias_id, name_id)
*** Stats (gender: m, f, o, sex: m, f, o, age)
*** Locations (street, street 2, city, state, zip, country, territory, etc.)
*** Relatives
*** Friends
*** Work History
** Places
** Crimes
*** Locations (place_id)
** Writings
** Animals
** Logins / Profile
*** profile_id, Username, email, password
** Images
*** profile_id (optional)
** Companies
** Meta-Project
*** codahale metrics setup
*** health checks setup
*** layers
**** http
**** encryption
**** compression
*** deployables
**** website crawler
***** conical urls
***** versioning of request / response
***** storage of request / response in S3
**** api crawler
***** versioning of request / response
***** storage of request / response in S3
**** html extraction (xpath, regex)
***** js, css file extraction
***** link extraction
****** feeds back into api / web crawlers
* Services / Tools
** https://github.com/begriffs/postgrest
* Deployment
** Java
*** Roll gc logs: https://jyates.github.io/2012/11/05/rolling-java-gc-logs.html
* Databases
** CANCELLED sort out differences between
   CLOSED: [2015-01-17 Sat 12:27]
*** foundationdb
**** https://foundationdb.com/
*** postgres
**** http://www.postgresql.org/
*** neo4j
**** http://neo4j.com/
*** titan
**** https://github.com/thinkaurelius/titan
**** https://thinkaurelius.github.io/titan/
*** rocksdb
**** https://github.com/facebook/rocksdb/wiki
*** redis
**** http://redis.io
*** flockb
**** https://github.com/twitter/flockdb
*** MapDB
**** https://github.com/jankotek/MapDB
adamdecaf / horizon

Old notes dump #16