CodeForPhilly / jawn

'Git for Tabular Data'
http://datjawn.com
BSD 3-Clause "New" or "Revised" License
44 stars 9 forks source link

Database tracker is able to detect changed values #43

Open flyingzumwalt opened 8 years ago

flyingzumwalt commented 8 years ago

For more info & context, read the Use Case in #44.

Story

When user tells the database tracker to create a commit (see #41) Then it detects the changes and converts them to blocks in a jawn commit And writes the commit to the feed in jawn

Additional Info

What kind of Database?

For now we will assume sqlite.

How does it figure out what changes have occurred?

The database tracker will need to be able to compare what's in the jawn feed and what's in the database.

Either way, the feed should have info about when data was last pulled from the database.

Option 1: Parse the database log Option 2: If the database tables keep a "last updated" timestamp then we can just query for things that have been updated since the last time the feed was updated. -- we can either ask the Open Data team whether their data has this info, or we can read up on ArcGiS ourselves.

flyingzumwalt commented 8 years ago

Standardized SQL functions in ArcGIS Online

benjaminettori commented 8 years ago

Start with one table toy example.

Then move on to several tables. How do you represent multiple tables of data in one feed? Or do we separate tables into multiple feeds?

flyingzumwalt commented 8 years ago

My current thinking on how to do this: Storing Tabular Data in a DAG. If you need context, this is from the larger Replication Patterns deck, which outlines how Git structures its DAG.

I do have some new tweaks to the ideas in that deck, based on the way "live" feeds work in hypercore, but the slides get at the essential idea.

Especially look at slides 18 and 19, plus Mapping Git Patterns to Hypercore, which starts on slide 22

benjaminettori commented 8 years ago

@flyingzumwalt @CameronSima

Just wanted to give you an update on what I'm going. Here is my current thinking for how to detect updates, assuming we have a lastUpdate column in each table in the db.


var sqlite3 = require('sqlite3').verbose()
 var fs = require('fs')

  module.exports = DatabaseTracker

  function DatabaseTracker (dbfile) {
    var exists = fs.existsSync(dbfile)
    this.db = new sqlite3.Database(dbfile)
  }

  DatabaseTracker.prototype.getAllTables = function (callback) {
    var query = "Select name FROM sqlite_master WHERE type = 'table'"
    var dbTracker  = this
    this.db.serialize(function () {
        dbTracker.db.all(query, function (err, tables) {
            if (err) {
                throw new Error(err)
            }

            callback(tables)
        })
    })
  }

  DatabaseTracker.prototype.getChanges = function (date, callback) {
    var currentDb = this.db

    function findAllChanges(tables) {
        var changes = {}
        var tableNumber = tables.length
        tables.forEach(function (item, index) {
            //TODO: might be better to use db.All() instead of statement
            var statement = currentDb.prepare('SELECT * FROM (?) t WHERE t.lastUpdate > ' + date)
            statement.all(item.name, function (err, rows) {

                if (err) {
                    console.log(err)
                }

                changes[item.name] = rows
                statement.finalize()
                if(changes.length = tableNumber) {
                    callback(changes)
                }
            })
        })
    }

    this.getAllTables(findAllChanges)
  }

This code does not fully run yet, but that is how far I've gotten.

flyingzumwalt commented 8 years ago

We should put the database tracker code into a separate repository. I set up a github repo here: https://github.com/CfABrigadePhiladelphia/jawn-db-tracker (cc @mels065)