dresende / node-orm2

Object Relational Mapping
http://github.com/dresende/node-orm2
MIT License
3.07k stars 376 forks source link

Is it possible to support connection pooling transaction based on current "node-orm2" architecture? #284

Open taoyuan opened 11 years ago

taoyuan commented 11 years ago

node-orm-transaction has an issue Transactions do not work together with connection pooling.

It looks that it is a big problem and difficult to solve.

There are too different between node-orm2 and node-persist, and the solution of JugglingDB has taken maybe has performance issues.

Maybe expose connect and add a connection parameter to the functions involved sql execution is a complete and optional solution. But the problem is too much code should be changed.

taoyuan commented 11 years ago

Sequelize transaction issue has been raised two years. Until now not yet been solved.

dxg commented 11 years ago

I'm intrigued by this issue. I'm going to look into it from the postgres side of things. If it works for pg, shouldn't be hard to port to mysql aswell.

cc https://github.com/dresende/node-orm-transaction/issues/2

taoyuan commented 11 years ago

OK, Thanks. I'll keep paying attention.

dresende commented 11 years ago

The problem is that when having a transaction, pooling should not be used, or... it should be used to get a connection to start transaction and future queries should hold that particular connection. Something has to be done in the driver(s) to maybe hold a possible connection (if asked) and dispose (when asked). This way the plugin could call the start transaction and ask the driver to hold that, and when committing (or rolling back) the plugin could then tell the driver to dispose.

Either we do this or forget the plugin and pass the transaction code to inside the driver.

dxg commented 11 years ago

The problem with that solution is it will affect queries running outside of the transaction at the same time (ie, webserver serving other requests) right until the transaction is complete. If in the middle of the transaction there is a call to a slow third party API, this could be quite noticible.

Things get even worse when whilst one transaction is outstanding [waiting for the API], another request triggers a second transaction block. What do we do then?

Ideally the connection holding behaviour would somehow be scoped to the transaction block only.

@taoyuan 's suggestion of explicitly passing the connection to all queries within the transaction seems like the logical choice at first, however things will break unexpectedly when there are before* hooks which run other queries, or when saving a model results in associations also being saved. We'd have to make sure to pass the connection to all sub queries.

When we call orm.connect a connection is created and passed into an ORM instance which holds a list of models. Right now I'm wondering if this can somehow be turned on it's head..

notheotherben commented 11 years ago

Another alternative would be to have connection identifiers within ORM, so creating a transaction would return a chainable object which holds some kind of transaction ID. This could then be used by ORM to identify the connection on which to execute the transactions.

I'm thinking something along the lines of

var t = db.transaction();
Model.transaction(t).get(id, function(err, model) { ... });
OtherModel.transaction(t).create({ ... }, function(err, model) { ... });
t.commit();

In this form I think it would be possible to implement it as a simple extension to node-sql-query, which would obviously simplify our lives a lot, since we would only have to link a db.transaction() object to a specific connection instance on which all Model.transaction(t) and Instance.transaction(t) would act. It would also allow (if we use a node-sql-query extension) it to support all future chaining function updates.

dxg commented 11 years ago

Are there any benefit to passing the transaction around over passing the connection?

I'm curious about how we could make it work for hooks.

beforeSave: function (t) {
  otherModel.transaction(t).save( ... );
}

This syntax would be opt in if you want transaction support. This would be confusing though.

Internally, we'd have to pass t around to hooks and things like saveAssociations, which sort of points to including transactions in ORM itself.

notheotherben commented 11 years ago

I guess if we're willing to add a commit() method or something similar to the connection object then it would be possible to do that as well. As far as I could reason, all that a transaction object's public API would need is the following:

{
    commit: function() { },
    rollback: function() { }
}

and then obviously we would need a reference to the connection it would operate on.

As for hooks, I'd assume that it would make more sense to transparently handle it behind the scenes, so any operations within a transaction remain within that transaction. (Not sure how possible it would be to do this, maybe we just set this to model.transaction(t) behind the scenes if the operation is run through a transaction).

dxg commented 11 years ago

Yeah transactions make more sense from that perspective. With connections, the commit would need to be attached to something like ORM itself: ORM.commit(connection). t.commit() is easier since t is already there.

As for handling hooks transparently, I don't think it's possible. Consider this example:

orm.connect("...", function (err, orm) {
  var History = db.define("history", {
    event  : String,
    info   : String,
    userId : Number
  });

  var Person = db.define("person", {
      name    : String,
      surname : String,
      age     : Number
    }, {
      hooks: {
        beforeSave: function (t, next) {
          History.create({
            event: "save",
            info:  "...",
            userId: this.id
          }, function (err, item) {
            next(err);
          });
        }
      }
    }
  );
});

If there is a way to do it that's better than History.transaction(t).create({ then I'd love to hear about it! Slightly worried that since closures would catch us out here, they might also catch us out somewhere else too..

notheotherben commented 11 years ago

Okay, let's do this then. When you call .transaction(t) on a Model/Instance it wraps it in a transaction compatible version, it should inherit all the properties, methods etc. from the standard one but overload our hook caller to ensure that all hooks operate on the transaction wrapper instead of the original one. We can also have a .transaction property on that wrapper which represents the transaction object.

That would allow us to do the following:

beforeSave: function(next) {
    //Would automatically be part of the current transaction, since it's done within ORM
    this.age++;

    //Would automatically be part of the transaction since 'this' is a transaction aware wrapper
    this.addHistory({
        event: 'save',
        info: '...'
    }, function(err, item) {
        next(item);
    });

    //Would need to be explicitly added to the current transaction
    History.transaction(this.transaction).create({
        event: "save",
        info: "...",
        userId: this.id
    }, function(err, item) {
        next(item);
    });
}

(Excuse the parallel calls, just proving a point - it wouldn't actually work correctly)

It would then be up to the user to 'opt-in' to using transactions in hooks for anything outside of modifications to the current model, which probably isn't a bad idea since it favours explicit requests from the user over transparent behaviour which may be unexpected.

notheotherben commented 11 years ago

Something which we're missing as well is that once a connection has been pooled for a transaction, it should not be available to ORM for general requests until the transaction has been completed. I'm sure that shouldn't be a massive issue, but something to keep in mind.

dxg commented 11 years ago

Yeah, that shouldn't be a huge issue; Once you check a connection out of the pool you have it exclusively.

Even with all the above, we still have an issue with closures; Methods might have calls to History aswell. Other models might also. This basically means that if you'd like to use transactions, you have to always use: History.transaction(transaction).create in hooks and methods for all models.

It's becoming clear why sequelize has had the pooled transaction issue open for 2 years.

I've been thinking about a different solution, but have to run some benchmarks to see if it's plausible. It involves reconstructing an entire copy of ORM but with a single connection checked out from the main ORM pool. It would mean other than the cost of reconstruction, we wouldn't really have to change much code, nor pass transactions around. It would effectively "just work". If it takes more than a millisecond or two of CPU time though, it may be off the table.

EDIT I did some benchmarks, and in my linux VM it takes between 0.9 and 1.9ms. This is with 6 models and a bunch of associations. It means an absolute maximum of 500 req/s but I don't think anyone would get anywhere near that anyway with a single process talking to a DB server.

notheotherben commented 11 years ago

Only problem I can think of with that approach is that we will no longer have a unified cache, so objects created/retrieved/modified in one transaction will not be updated on the original ORM instance. Other than that, a rather elegant solution (even if it bugs the perfectionist in me).

dresende commented 11 years ago

I'm sorry for my delayed comments but I'm on vacation time and trying to step away from code for a while. Feel free to make decisions about the module.

About the transactions I didn't thought about concurrency, it's another big problem but maybe it can be reduced if we think transactions only make sense for queries that change the database, namely insert, update and remove. About things being done in hooks, they should probably be inside the transaction automatically (if changing database).

So, we should make it as transparently as possible to the user. I'm thinking about a way of having a transaction support in the drivers but being controlled by the plugin. I'm going to make some tests and see if it's possible (making sure multiple/nested transactions should be possible).

dxg commented 11 years ago

It bugs the perfectionist in me too..

Nested transactions you say :-\

dresende commented 11 years ago

Ok, so nesting transactions is not possible in any of the supported drivers. The only workaround is using savepoints, so let's forget that for now. I'm going to concentrate on being possible to do 2 things:

dresende commented 11 years ago

I think I found a cool way to make this transparently but it involves domains, which is unstable right now (although for what we need I don't think it will break but..). Here's an example:

var domain = require("domain");

begin(function () {
    query();
});

begin(function () {
    query();
});

query();

function begin(cb) {
    var d = domain.create();
    d.uid = Math.random();

    d.run(cb);
}

function query() {
    var intervalId = setInterval(function () {
        console.log("query id:", domain.active ? domain.active.uid : "none");
    }, 500);
    setTimeout(function () {
        clearInterval(intervalId);
    }, 2000);
}

As you can see, there's a scope in the domain where we can gather info (like a connection, a transaction indicator, ..).

dxg commented 11 years ago

This looks very interesting.

It "marks" the execution chain with an ID, which could be the connection ID to use (for example) hence all subsequent queries spawned from that line of code have the same ID. Once we commit or rollback, we can call .dispose() and the "mark" is removed. Subsequent queries will use any connection.

Does that sound right?

dresende commented 11 years ago

Yes, I don't know exactly the specifics you're thinking but I was thinking about something like this steps.

  1. Drivers that support pool should have 2 methods, 1 to return a connection handle and another to dispose it (passing the connection handle again);
  2. Drivers should look if there's an active domain and see if there's a specific key in the domain scope (like ORM_Connection or something that uniquely identifies it as being part of ORM to avoid having people using domains and mixing keys). If there is, try to use it as the connection instead of getting one from the pool;

Then, in the plugin it would be easy to get a connection handle, create a domain and save it there. In the end we could just pass the handle again (so the driver can dispose correctly) and then destroy the domain (after the real COMMIT or ROLLBACK).

When not having pool and creating more than 1 transaction, I was thinking that we could have 2 scenarios when creating the second transaction (that people could choose in the global settings or when creating the transaction):

  1. Throw;
  2. Wait for the first to finish.

But this are next steps, first plugin needs to know if driver supports pool or not.

dxg commented 11 years ago

Sounds good!

dresende commented 11 years ago

I'm going to wait for @SPARTAN563 to merge the changes he has and then work on this.

notheotherben commented 11 years ago

Okay, I'll see what I can do about getting that working ASAP. In the mean time, I've added some early TypeScript interfaces for the module, they're missing some things like enforce etc. but are generally usable. Hopefully we can add to them as we progress with the project, and use them as a kind of API layout.

dirkmc commented 10 years ago

Hi, any progress on this? Thanks, Dirk

dirkmc commented 10 years ago

Hi, not sure if I'm missing something here, but it seems like the solution is quite simple. At the moment when pool=true, a new db connection is taken from the pool for every query. But if instead you were to just use the current connection the problem would go away. The pooling logic should happen in the orm.connect() method not inside the execute query method.

dirkmc commented 10 years ago

Sorry I didn't think of the case in which unrelated queries are executed in parallel to queries which must happen inside the transaction.

I took a look at implementing the domain idea that Diogo suggested. In the transaction plugin I created a domain when the transaction gets created. However domain.active does not persist until commit:

    db.transaction = function (cb) {
        var txnDomain = domain.create();
        txnDomain.run(function() {

            // Outputs an object
            console.log(domain.active);

            db.driver.execQuery("BEGIN", function (err) {
                //...

                return cb(null, {
                    commit: function (cb) {

                        // Outputs undefined
                        console.log(domain.active);

I tracked the domain object down through the code, and found that domain.active disappears somewhere in the mysql driver:

Driver.prototype.poolQuery = function (query, cb) {
    this.db.pool.getConnection(function (err, con) {
        // Outputs an object
        console.log(require('domain').active);
        // ...
        con.query(query, function (err, data) {
            // Outputs undefined
            console.log(require('domain').active);

So the only way to make this solution work would be to create another domain when the query returns, which seems pretty hacky and would end up creating domains for every query inside a transaction.

dirkmc commented 10 years ago

It's also worth mentioning that the problem of concurrent, unrelated queries occurring within the transaction already exists in this module if connection pooling is disabled, which kind of defeats the purpose of having transactions at all. eg

function updateMyModel() {
  db.transaction(function(e, t) {
    models.MyModel.find(function() {
      models.MyModel.update(function() {
        t.commit();
      });
    })
  }
}

function updateSameField() {
  models.MyModel.update(...)
}

If updateSameField() is called while updateMyModel() is still running, then updateSameField()'s query will happen inside updateMyModel()'s transaction.

dirkmc commented 10 years ago

I found a solution to the problem which is similar to what @SPARTAN563 and @dxg discussed above.

I create a connection pool, and take a single connection from it (let's call it the "main connection") which is shared between all requests in the process. When a transaction is started, I take a new connection from the pool (let's call it the "transaction connection", and when there's a commit or rollback I release the transaction connection. I also create a new instance of ORM with this new transaction connection so that all queries in the transaction will occur on the transaction connection (ie, separately from the main connection).

I've included the most interesting part of the code below. You may want to incorporate some of this code into the node-orm2 module. Note that there is a memory leak in the ORM class, I've hacked around it, see comment below:

var define = function (db, models, next) {
    // Define models here ...
}

var pool = require('mysql').createPool(Config.db.uri);
var poolConnect = function(cb) {
    pool.getConnection(function(err, poolConn) {
        if (err) return cb(err);

        orm.use(poolConn, 'mysql', function(e, db) {
            if(e) return cb(e);

            // Really bad hack: ORM introduces a memory leak so we need to take
            // care of it here
            // Note: This function is defined elsewhere in my code, it just
            // removes the last listener with the given event name on the
            // given EventEmitter
            removeLastListener(poolConn, 'unhandledError');
            removeLastListener(poolConn, 'error');

            db.release = poolConn.release.bind(poolConn);
            mixinTxn(poolConnect, db);

            var models = {};
            define(db, models, function() {
                cb(null, { db: db, models: models });
            });
        });
    });
};

function mixinTxn(createConnection, db) {
    db.transaction = function(callback) {
        createConnection(function(err, txnConnection) {
            if(err) return callback(err);

            txnConnection.db.driver.execQuery("BEGIN", function (err) {
                if (err) return callback(err);

                txnConnection.commit = function (cb) {
                    txnConnection.db.driver.execQuery("COMMIT", function (err) {
                        txnConnection.db.release();
                        return cb(err || null);
                    });
                };
                txnConnection.rollback = function (cb) {
                    txnConnection.db.driver.execQuery("ROLLBACK", function (err) {
                        txnConnection.db.release();
                        return cb(err || null);
                    });
                };
                return callback(null, txnConnection);
            });
        });
    };
};

// This is an example of how to use this code. You would probably do this
// in a separate file where you define your models
// Here db and models refer to the "main connection"
module.exports = function(db, models) {
    models.MyModel = db.define("my_model", {
        // ...
    });

    // Define a function on the model that uses a transaction
    models.MyModel.updateSomeField = function(id, cb) {
        db.transaction(function(err, txn) {
            // Here models refers to the "transaction connection"
            txn.models.MyModel.get(id, function(err, myModel) {
                if(err) { txn.rollback(); return cb(err) }

                // This will also occur on the "transaction connection"
                myModel.save({ someField: 2 }, function(err, mySavedModel) {
                    if(err) { txn.rollback(); return cb(err) }

                    txn.commit(cb);
                });
            });
        });
    }
}
mspisars commented 10 years ago

any decision/progress on this?

dresende commented 10 years ago

This is for the mysql module: now that pooling and transactions are more mature in the driver, perhaps we could change the driver implementation to use a pool and get a new connection for each transaction and be able to track this in the driver. I'm not sure if this is good or not, I've been off for a while and need to dive deep again in the code to get a better picture.

gfhuertac commented 9 years ago

Related question: changes are going to be implemented inside node-orm2 or transactions will still be provided as a plugin module? We are trying to implement transactions properly now at our company and if we do implement them we want to contribute to the project, but we are not sure which way to go.

dxg commented 9 years ago

I think the changes are fundamental enough that it may make sense to make them inside the main code base. If they could be made in a plugin then great, but I'm not sure that's possible.

cjnqt commented 8 years ago

Any progress on this?

shadow88sky commented 8 years ago

Any progress on this?

dxg commented 8 years ago

Nope.

Drazke commented 8 years ago

Hi, I suggest my solution in the #742 PR above. I just added the possibility to create a pool connection with some new driver function and the possibility to use this connection when create/modify/remove some instance/model. It work fine with mysql, but I didn't test it with postgresql, if someone could test it. Thanks

I also create a PR for the plugin node-orm-transaction that work with it: https://github.com/dresende/node-orm-transaction/pull/6

mishuagopian commented 8 years ago

Nice @Drazke