aaaristo / dyngodb

An experiment to get a MongoDB like interface in front of DynamoDB
Apache License 2.0
69 stars 5 forks source link

dyngodb2 Stories in Ready

An experiment (alpha) to get a MongoDB like interface in front of DynamoDB and CloudSearch. Now supporting transactions as described by the DynamoDB Transactions protocol.

In dyngodb2 we dropped the $ sign in favor of _. Also $version now is called _rev. The old branch is available here. Fixes to the old version will be released under the dyngodb npm package while the new releases are under the dyngodb2 npm package.

Why?

DynamoDB is elastic, cheap and greatly integrated with many AWS products (e.g. Elastic MapReduce, Redshift,Data Pipeline,S3), while MongoDB has a wonderful interface. Using node.js on Elastic Beanstalk and DynamoDB as your backend you could end-up with a very scalable, cheap and high available webapp architecture. The main stop on it for many developers would be being able to productively use DynamoDB, hence this project.

Getting started

Playing around:

$ npm install -g dyngodb2
$ export AWS_ACCESS_KEY_ID=......
$ export AWS_SECRET_ACCESS_KEY=......
$ export AWS_REGION=eu-west-1
$ dyngodb2
> db.createCollection('test')
> db.test.create({ _id: 'john', name: 'John', lname: 'Smith' }) // if the _id exists create will throw an error
> db.test.save({ name: 'Jane', lname: 'Burden' })
> db.test.findOne({ name: 'John' })
> john= last
> john.city= 'London'
> db.test.save(john)
> db.test.update({ _id: 'john' },{ $inc: { wives: 1, childs: 3 } }) // uses DynamoDB updateItem ADD op
> db.test.find({ name: 'John' })
> db.test.ensureIndex({ name: 'S' })
> db.test.findOne({ name: 'John' })
> db.test.ensureIndex({ $search: { domain: 'mycstestdomain', lang: 'en' } }); /* some CloudSearch */
> db.test.update({ name: 'John' },{ $set: { city: 'Boston' } });
> db.test.find({ $search: { q: 'Boston' } });
> db.test.findOne({ name: 'Jane' }) /* some graphs */
> jane= last
> jane.husband= john
> john.wife= jane
> john.himself= john
> db.test.save(john);
> db.test.save(jane);
> db.ensureTransactionTable(/*name*/) /* some transactions :) */
> db.transaction()
> tx.test.save({ name: 'i\'ll be rolled back :( ' })
> tx.rollback(); /* your index is rolled back too */
> db.transaction()
> tx.test.save({ name: 'i\'ll be committed toghether with somenthing else' })
> tx.test.save({ name: 'somenthing else' })
> tx.commit(); /* your index is committed too */
> db.test.remove()
> db.test.drop()

Goals

What dyngodb actually does

Finders

There are 3 types of finders actually (used in this order):

Indexing

Indexes in dyngodb are DynamoDB tables that has a different KeySchema, and contains the data needed to lookup items based on some attributes. This means that typically an index will be used with a Query operation.

There are actually 2 indexes (4 but only 2 are used):

Lost update prevention

Suppose to have two sessions going on

Session 1 connects and read John

$ dyngodb2
> db.test.find({ name: 'John' })

Session 2 connects and read John

$ dyngodb2
> db.test.find({ name: 'John' })

Session 1 modifies and saves John

> last.city= 'San Francisco'
> db.test.save(last)
done!

Session 2 modifies and tries to save John and gets an error

> last.country= 'France'
> db.test.save(last)
The item was changed since you read it

This is accomplished by a _rev attribute which is incremented at save time if changes are detected in the object since it was read (_old attribute contains a clone of the item at read time). So when Session 2 tries to save the object it tries to save it expecting the item to have _old._rev in the table and it fails because Session 1 already incremented it.

note: when you get the above error you should reread the object you where trying to save, and eventually retry your updates, any other save operation on this object will result in bogus responses.

Arrays

Actually dyngodb is pretty incoherent about arrays, infact it has two kinds of array persistence:

Schema

In dyngodb you have 3 DynamoDB table KeySchema:

Some automatically generated attributes:

Transactions

In dyngodb2 there is basic support for transactions take a look at the [tests] (https://github.com/aaaristo/dyngodb/blob/master/test/transaction.test.js). It is an initial implementation of the protocol described here. All the db. APIs are still non-transactional, while tx. APIs: that are really the same as db.* behaves in a transactional way. Once you get a transaction by calling db.transaction() you can operate on any number of tables/items and any modification you do is committed or rolledback with the others performed in the same transaction (this is true also for items generated by indexes like fat.js, while cloud-search.js fulltext search is still non-transactional).

Keep in mind that this is completely experimental at this stage.

Local

It is possible to use DynamoDB Local by adding --local to the commandline:

dyngodb --local

.dyngorc

Using the .dyngorc file you can issue some commands before using the console (e.g. ensureIndex)

standard input (argv by optimist)

commands.txt

db.test.save([{ name: argv.somename },{ name: 'Jane' }])
db.test.save([{ name: 'John' },{ name: 'Jane' }])
db.test.save([{ name: 'John' },{ name: 'Jane' }])
dyngodb2 --somename Jake  < commands.txt

Streams (for raw dynamodb items)

Example of moving items between tables with streams (10 by 10):

dyngodb2
> t1= db._dyn.stream('table1')
> t2= db._dyn.stream('table2')
> t1.scan({ limit: 10 }).pipe(t2.mput('put')).on('finish',function () { console.log('done'); })

basic CSV (todo: stream)

Example of loading a csv file (see node-csv for options)

dyngodb2
> csv('my/path/to.csv',{ delimiter: ';', escape: '"' },['id','name','mail'])
> last
> db.mytbl.save(last)

basic XLSX

Example of loading an xlsx file

dyngodb2
> workbook= xlsx('my/path/to.xlsx') 
> contacts= workbook.sheet('Contacts').toJSON(['id','name','mail'])
> db.mytbl.save(contacts)

Provisioned Throughput

You can increase the througput automatically (on tables and indexes), dyngodb will go through the required steps until it reaches the required value.

dyngodb2
> db.mytbl.modify(1024,1024)
> db.mytbl.indexes[0].modify(1024,1024)

Export / import (todo: stream)

Export:

dyngodb2
> db.mytbl.find()
> db.cleanup(last).clean(function (d) { gson('export.gson',d); });

Import:

dyngodb2
> db.mytbl.save(gson('export.gson'));

You can use either json or gson function the only difference is that the gson function is able to serialize cyrcular object graphs in a non-recursive way.

S3 Backup / Restore (todo: better streaming)

Backup:

dyngodb2
> db.mytbl.backup({ bucket: 'mybucket' })

This will create an S3 object named mybucket/table/time.bck

Restore:

dyngodb2
> db.createCollection('mytbl')
> db.mytbl.restore({ bucket: 'mybucket', file: 'mybucket/table/time.bck' });

Q&D migration from dyngodb to dyngodb2

dyngodb
> db.mytbl.find()
> db.cleanup(last).clean(function (d) { gson('export.gson',d); });
cat export.gson | sed 's/"$id"\:/"_id":/g' > export2.gson
dyngodb2
> db.mytbl.save(gson('export2.gson'));

Things you may need to update:

AngularJS and Express integration

Check: https://github.com/aaaristo/angular-gson-express-dyngodb

Help wanted!

Your help is highly appreciated: we need to test / discuss / fix code, performance, roadmap

Bitdeli Badge