Removed auto-commit to make async query sequences work better

myndzi commented 12 years ago

You can use this if you like, please check my logic :)

The auto-commit was committing when it shouldn't, something like: add to queue in callback of a query executed after .commit was called. It was messy, but upon consideration it seems that the auto-commit functionality is unnecessary anyway. Consider: 1) Auto-commit is only enabled when the user calls .commit 2) Auto-commit is only enacted when the user creates a new Queue 3) New Queues do not become active when they are created 4) New Queues will not become active until the active queue is completed 5) The active queue does not complete until you call .commit or .rollback

The only case where auto-commit makes sense is if a user calls .commit and then tries to begin a new transaction on the old Queue object... but this is not supported since the Queue object does not have a .startTransaction method.

This fix enabled some code that otherwise failed, consisting of parallel multiple inserts, all tests pass.

bminer commented 12 years ago

Thanks for taking the time to improve mysql-queues.

Kris, I apologize for my stupidity, but I don't fully understand the purpose of this change. Is there any way you can help me out? Do you have any examples of code that were previously broken and now work due to this change? Please feel free to comment back on this pull request, email me, chat offline, or whatever. This page has all of my contact info Thanks!

myndzi commented 12 years ago

The code worked when called singly, but problems arose when I tried to loop it to do 10 or 100 at a time. I've included the relevant section; there's some unrelated code above it and the whole thing is in a method that does a lot of sequential async stuff to generate some data.

I was unable to determine exactly what was going on; If we call each stage below by "a, b, c" and number the calls "1, 2, 3", here is what happened:

1 a 2 a 3 a 1 b 1 c 2 b 3 b <no .commit method error>

What happens now is: 1 a 2 a 3 a 1 b 1 c 2 b 2 c 3 b 3 c

The first one worked because the START TRANSACTION was executed immediately since it was the first in the sequence; for the rest of them, start transaction got queued, followed by the first insert command. When two items were queued, the loop that send the queries to node-mysql would only call the first callback, and since that was start transaction there was no insertId available. I tried to reproduce the behavior in node-mysql and couldn't, but after some hours I couldn't really see what was going wrong. I settled for messing with the code until it worked; what I've sent you is the "least modification" version - the orignial was changed quite a lot more, but I took care to find the solution that altered the least.

It is quite possible that I am "doing it wrong", but this seems a useful mechanism and unless you can spot a flaw in my logic I see no reason not to make the change.

Thanks :)

var trans = db.startTransaction(), moar = this;

first(function () { trans.query( "INSERT INTO Cookies (cookieText) VALUES (?)", [ cookie ], this ); trans.execute(); }). then(function (err, res) { if (err) { trans.rollback(); callback(err); return; } var lastinsert = res.insertId; trans.query( "INSERT INTO Personas ( some stuff )", [ ... lastinsert ], this ); }). then(function (err, res) { if (err) { trans.rollback(); callback(err); return; } trans.commit(); moar(res.insertId); });

bminer commented 12 years ago

Hmmm... I still don't quite understand what you're talking about. But, I have a little bit more insight. I'd suggest checking out the Don't do this section of the documentation to see if you are falling into that "trap." That's pretty much the only way that an auto-commit would occur. Code that causes an auto-commit is, well... loosely speaking, "wrong." That's why a warning message is generated, etc.

bminer commented 12 years ago

By the way, I still think that "auto-commit" fallback is a bit of a hack, and I lot of people don't really like it much (including myself). But, I think it's necessary. Here is an except from issue #6:

BTW... The pause() method is admittedly a bit of a hack... but it works pretty well. All you do is add it right before your async operation, and... then resume() again when your asynchronous operation completes. Or, actually, if you are calling rollback() or commit(), resume() is called for you. Especially if your async operation executes quickly (i.e. a quick call to Redis), the extra pause doesn't cause much of an issue.

The problem is that without pause() mysql-queues has no way of knowing that you are about to do something asynchronous in a query callback. When all callbacks have completed, mysql-queues has no choice but to return control back to the main queue.

The only other way to do it is to get rid of pause() / resume(), and force the programmer to call something like done() after all work is done in the query callback. That would tell mysql-queues to return control back to the main queue. But what if you forget to call done()... part of your program could "hang" because no more queries are getting executed anymore.... they all get queued.... until you run out of memory.

myndzi commented 12 years ago

In this case, the async call it is waiting on is... a query within the transaction. The use-case seems pretty straightforward: insert something, get the insert id, insert something with the previous insert id as a foreign key, roll back if it fails. So, the callback is calling the next item in the transaction - and judging by the comments in the source code, this is expected behavior.

myndzi commented 12 years ago

Derp, I think I've answered my own question and when I compiled that list I fell into a trap; you define some of those functions in the parameters to the new Queue function, but call them in other places IIRC. I read the last comment more closely and I understand what you are trying to avoid now. When I was compiling my final edit, I went over the code and tried to track exactly when and where auto-commit was used and came up with the list I posted in my original message; the apparent conclusion is the only reason I sent you a pull rather than just use the code for my own purposes -- but that conclusion appears to be incorrect.

On the other hand, if a programmer doesn't .commit his transaction, that's a bug in my opinion. It may be hard to track down without help, but you could easily add a warning without affecting the actual queue behavior.

Edit: Further consideration of: "The problem is that without pause() mysql-queues has no way of knowing that you are about to do something asynchronous in a query callback. When all callbacks have completed, mysql-queues has no choice but to return control back to the main queue." Brings me to the question: why does mysql-queues have no choice but to return control to the main queue? It's in the middle of a transaction that hasn't been committed, it should remain there. I suppose this is the same as implying the pause/resume thing, but there's no way around it - if you are doing a transaction where one query depends on another, you have to lock the table and wait the time. The same thing applies to reading a file, I guess. Async calls between the transaction start and the commit should come with the expectation of locking up the database...?

bminer commented 12 years ago

OK. I'm confused. Should I close the pull request?

Also, as you probably know, to run multiple queries within a transaction, you can do this...

var trans = client.startTransaction();
trans.query("INSERT...", [x, y, z], function(err, info) {
    if(err)
        trans.rollback();
    else
        trans.query("UPDATE...", [a, b, c, info.insertId], function(err) {
            if(err)
                trans.rollback();
            else
                trans.commit();
        });
});

Or, perhaps, with an async call between them:

var trans = client.startTransaction();
trans.query("INSERT...", [x, y, z], function(err, info) {
    if(err)
        trans.rollback();
    else
    {
        trans.pause();
        fs.readFile("foo.txt", function(err, data) {
            if(err) return trans.rollback();
            trans.query("UPDATE...", [a, b, c, info.insertId], function(err) {
                if(err)
                    trans.rollback();
                else
                    trans.commit();
            });
            //We call resume() after we add the prior query to the Queue, not before
            trans.resume();
        });
    }
});

Hopefully, that clears everything up.

myndzi commented 12 years ago

Well, I only provided the code thinking you might want it; I'll continue to use my modification since it alleviates my problem.

The code you posted is essentially what I am doing; first is just a flow control thing so I don't have to have stacks of nested callbacks.

As I mentioned at the start, the code works for a single instance, but not for many. If you do that code with three queues such that two are waiting while the first runs, the second will fail, assuming it truly is equivalent.

(This is because the other queues wind up with two queries: START TRANSACTION and their first query in them, but only the first callback is called [as far as I can tell, by node-mysql] - breaking the callback-query chain.)

If you want to pursue the issue I can try to distill it down to a simple working counterexample...

bminer commented 12 years ago

To answer your most recent post... here is an example of multiple instances where you need to pause() and resume() multiple times for multiple async calls... how is this different from what you need to do? What do you mean by, "If you do that code with three queues such that two are waiting while the first runs, the second will fail, assuming it truly is equivalent?" Do you mean that you have 3 separate transactions at the same time?

var trans = client.startTransaction();
trans.query("INSERT...", [x, y, z], function(err, info) {
    if(err) return trans.rollback();
    //First async call
    trans.pause();
    fs.readFile("foo.txt", function(err, data) {
        if(err) return trans.rollback();
        trans.query("UPDATE...", [a, b, c, info.insertId], function(err) {
            if(err) return trans.rollback();
            //Second async call
            trans.pause();
            fs.readFile("bar.txt", function(err, data) {
                if(err) return trans.rollback();
                trans.query("UPDATE...", [a, b, c, info.insertId], function(err) {
                    if(err)
                        trans.rollback();
                    else
                        trans.commit();
                });
                //We call resume() after we add the prior query to the Queue, not before
                trans.resume();
            });
        });
        //We call resume() after we add the prior query to the Queue, not before
        trans.resume();
    });
});

And, to answer your other prior post...

The problem is that without pause() mysql-queues has no way of knowing that you are about to do something asynchronous in a query callback. When all callbacks have completed, mysql-queues has no choice but to return control back to the main queue.

Brings me to the question: why does mysql-queues have no choice but to return control to the main queue? It's in the middle of a transaction that hasn't been committed, it should remain there.

Well... if it doesn't return control back to the main queue, any other queries that are executed against the database (perhaps elsewhere in your program) would block. All queries will block just because you forgot a commit() call somewhere in your program. Let's suppose you have a database-driven web server. Currently, if you have one particular request where you forget the commit() call, mysql-queues will auto-commit, print a warning, and then everything still runs normally. If mysql-queues did not return control back to the main queue, ALL future requests that query the database would block... until the server runs out of memory.

I dunno... it's been a while since I've looked at this project, but I think that was my rationale.

myndzi commented 12 years ago

Yes, three transactions at once. For example, for (var i = 0; i < 10; i++) { async query sequence }

Yes, other calls would block. The developer should expect this since he's left a transaction hanging.

"Just because you forgot the most important part of a database transaction" isn't a good reason to enable dummy-mode to me. Programmers aren't in the business of writing code that allows them to forget things. Preventing data corruption, sure, but I don't see how this does that.

bminer commented 12 years ago

@myndzi - I apologize. I completely misunderstood you. OK... 3 transactions at once is not supported. You can't run 3 transactions concurrently; MySQL doesn't let you do that. However, you can run multiple transactions, one at a time, in a particular order. To do this, create 3 separate queues by calling db.startTransaction() 3 different times. Then, queue the queries up for each transaction. Then, call execute() on each Queue in the order in which you want them executed. Each transaction will run in order with no overlap. And, furthermore, the second transaction will block until the first transaction has completed and all callbacks for that first transaction have also completed.

To run the same transaction 3 times:

for(var i = 0; i < 10; i++)
{
    var trans = db.startTransaction();
    trans.query("INSERT...", [x, y, z], function(err, info) {
        if(err) return trans.rollback();
        //execute more queries for this transaction maybe?  Or, maybe do nothing?
        trans.commit(); //Then, commit this transaction
    }).execute();
}

This loop still works with async calls inserted in, too.

And, to address your other comments... Yes, forgetting a "commit()" call is certainly a bug. And, you're right... maybe dummy-mode isn't such a great idea. The alternative is pretty scary, though.

I feel like I'm failing at communicating here. :) Are we on the same page yet?

myndzi commented 12 years ago

However, you can run multiple transactions, one at a time, in a particular order. To do this, create 3 separate queues by calling db.startTransaction() 3 different times.

This is what I am doing

Then, queue the queries up for each transaction. Then, call execute() on each Queue in the order in which you want them executed.

This is not possible, however. Since the transaction NEEDS the result of a previous call, it has to be executing to complete itself.

Without the autocommit behavior it behaves as expected. I'm not sure what this 'alternative' you are afraid of is, yet. It appears to be failed queries or hung programs, neither of which corrupt data...

I think the part that you're not following is the fact that I'm queueing multiple transactions that run sequentially via callbacks - that's the part that ruins it.

bminer commented 12 years ago

I think I see what you mean now. You're right. This is a bug that needs to be fixed.

You're doing something like this:

var trans = db.startTransaction();
trans.query("QUERY 1", function(err, info) {
    if(err) return trans.rollback();
    trans.commit();

    var trans2 = db.startTransaction();
    trans2.query("QUERY 2", function(err, info) {
        if(err) return trans2.rollback();
        trans2.commit();
    }).execute();
}).execute();

The alternative that I am afraid of is that all database calls made to the MySQL database connection will block if you forget a commit() call just once. The result is a hung program. True, it does not corrupt data, but IMHO, a hung server is worse than a server that can crash and reboot.

As far as fixing this issue is concerned, let me look further into your solution now that I finally understand the problem. Thanks for your help!

myndzi commented 12 years ago

The alternative that I am afraid of is that all database calls made to the MySQL database connection will block if you forget a commit() call just once. The result is a hung program. True, it does not corrupt data, but IMHO, a hung server is worse than a server that can crash and reboot.

It's not like it will happen unexpectedly. Any admin that puts code live without even running it ONCE to see that it works deserves what he gets. :)

Edit: I didn't quite understand the purpose of your code either, so I might fix it differently now that I do understand what it was supposed to do. I do not think the two behaviors are mutually exclusive, but I don't really see the one as necessary either way.

bminer commented 12 years ago

Haha! I tend to agree, BUT... realistically... it's a tough call. For example, in PHP, if you write a query and forget the COMMIT. Oh well... it only affects that specific HTTP request or a specific page. We're talking about pretty much deadlocking the entire server... not something you want to happen very often due to a slighly careless mistake. :)

Anyway, I'll look into it. This bug has to be fixed. Thanks again.

myndzi commented 12 years ago

You're welcome.

Also, the answer is to simply make it optional. Default it to enabled if you so desire. I don't have any sympathy for this error case though ;)

bminer commented 12 years ago

The latest commit of version 0.3.5 should fix your problem. Can you test with your code before I publish to NPM?

bminer commented 12 years ago

By the way, thanks again for finding this problem. I had a hell of a time trying to figure out what was actually causing this, so I understand why you started to remove the "auto-commit" functionality. Anyway, when I finally realized what the problem was, I just through in a little patch to fix (see the commit above).

Thanks very much for the pull request! If version 0.3.5 does not fix your problem, please let me know.

myndzi commented 12 years ago

Haha, I feel a bit silly not spotting that - but this conversation is what would have helped me find it I guess. I can test it tomorrow to verify with my code, I don't imagine you'll need this patch judging by the new change. (Knock on wood)

myndzi commented 12 years ago

Appears to work fine. Thanks.

bminer / node-mysql-queues

Removed auto-commit to make async query sequences work better #12