Closed mattmccutchen closed 7 years ago
Thanks, Matt. You're absolutely right. Massive design flaw in this package. I've observed this happening in a production app and it's not pretty.
I'm wide open to suggestions about how to fix this, but freely admit I don't have the depth of technical knowledge to implement this properly. (A package like this, dealing with db mutations and race conditions, really shouldn't be written and maintained by a high school math teacher hobby coding in evenings and weekends.)
In reality, a from-the-ground-up re-think and re-write is probably required.
This issue by itself shouldn't take much re-writing to fix (although I don't know what other problems might lurk that we might run into over time). One would take all the properties that make up the "transaction manager state", which are currently directly on tx
, and move them to a separate class, perhaps called TxState
. On the client, the tx
object would have just one TxState
, while on the server, it would have a map from Connection
to TxState
. tx
would have a helper method to get the correct TxState
for the currently executing code: on the client, the single one, or on the server, looked up based on Meteor._CurrentInvocation.get().connection
(equivalent to this.connection
in the Meteor method, but our code won't have access to this
of the Meteor method). My team is planning to use this package to add undo to our app, so I'll probably implement this fix at least for our own use, but I'm not sure how much additional work might be needed to meet your standards (e.g., tests).
Well, that's interesting. That's exactly where my mind went once I'd thought about the issue for a bit. I was planning to start separating out the code into two classes later today, do the map from connection to TxState instance and everything! However, if you're planning on doing this anyway ... :-) I imagine you'll make better design choices. Like I said, I'm not a full-time developer and you seem to have grokked the workings of the package pretty well already. The only standard I set for the package before each release is that it passes the (fairly limited) set of integration tests set up via the test app included in the package.
Hi Matt, Were you planning to go ahead with this refactor? If not, I'll probably start work on it in a week or two's time. If you are going to do it, there's an open pull request I should probably merge before you start. Cheers Brent
Sorry for the slow response. I'm not going to make the change unless and until I get our app to work with the transactions package in all other respects. I'm running into many issues, and it could be several more days before I know. You can start on the change now if you like. I'll check with you before starting on it myself.
Okay. I'll have a crack at it over the next week or two. I'm interested in hearing about the other issues with the package, as I'm using it in production at the moment and these will, no doubt, come back to bite me at some stage.
Honestly, having had a quick look at your spreadsheet app, I don't know if this package has the goods to support what you're implementing. It's pretty limited in its set of capabilities (only supports a handful of mongo operators, no multi: true
, no upserts, etc.).
I am very grateful that you've highlighted this issue. It's something that does need immediate attention.
Honestly, having had a quick look at your spreadsheet app, I don't know if this package has the goods to support what you're implementing. It's pretty limited in its set of capabilities (only supports a handful of mongo operators, no multi: true, no upserts, etc.).
Indeed. My assessment was that I would probably be better off adding the missing features than starting from scratch and getting bogged down in tons of design decisions (this is a weakness of mine). If I could upstream the enhancements and benefit from any future improvements to the package that are relevant to us (I don't know if that's likely), that's a bonus. However, there are a few downsides:
Unfortunately, it's impossible to know in advance whether I made the right decision. I'll keep you posted.
OK, I believe I have all the other issues worked out for the purpose of our app (I'll be filing issue reports soon), so I just need the concurrency fix before I'll be ready to merge the babrahams:transactions adoption to master of our app. @JackAdams, have you made any progress yet? If not, I'll start on the concurrency fix myself. I'm not worried about conflicts with #66; it doesn't look like they'll be hard to resolve.
Just yesterday I got an internet connection working at the remote place I'm staying. I'll get to work on this package today. Let me have a go at doing the concurrency fix and, if I get stuck, I'll reach out for help.
Hi Matt,
I've been working on this and can pretty much see my way through, but I'm having trouble with the key part: Meteor._CurrentInvocation.get().connection
-- I just can't seem to find it anywhere. I'll take a look at the Meteor
source in a minute, but just thought I'd see if you have any pointers about how to get the unique identifier for the current connection when in an arbitrary context on the server.
Cheers
Brent
Sorry, ignore that. Should have just looked at the source without bothering you. It's DDP._CurrentInvocation
not Meteor._CurrentInvocation
.
Closing in on a solution. Just have to make sure transient transaction instances are cleaned up properly so we don't get memory leaks.
I hope this (0.8.0) works. It's passing all tests and I've given it a run in a pretty demanding app (in terms of complex transactions). I haven't published to Atmosphere yet, but will soon.
Use tx.showConnection = true;
somewhere in common code to see which connection is being used (shown alongside each message being logged to the console from the transient, per-connection transaction objects).
Okay. This (0.8.0) is published on atmosphere. Thanks again for bringing this issue up and giving me the necessary push (and idea -- i.e. DDP._CurrentInvocation
) to get it fixed. It's of pretty fundamental importance to the package working reliably.
Belated confirmation that the fix is working in our app. Thanks! Since I got the initial integration working on December 22, I've been working intensely on our app: I built an extra feature ("transactional publishing"; I'll write about it on forums.meteor.com soon) and did major reworking to reduce the performance hit of transactions and make the related code understandable (it had become unmanageably complex). I finally finished today, so I'll file the rest of the issues.
Hi Matt, Thanks for the confirmation. I'm really interested in your work -- looking forward to the forum post and the issues you'll file. Is the reduction of the performance hit specific to your app, or is it something that could be generalized and made a part of this package? Cheers Brent
Here is the promised writeup on transactional publishing.
Is the reduction of the performance hit specific to your app, or is it something that could be generalized and made a part of this package?
In essence, our app was making repeated writes to the same documents during a transaction, which were already slow but became even slower with this package. Now our app works on temporary collections and then sends the diff down to the original collections (implementation). This technique is potentially useful for any app with a notion of a transaction that makes repeated writes to the same documents during a transaction, whether or not the app uses this package for transaction history and recoverability. Of course, if the app uses this package and uses nonlinear history, the diffing may change the semantics of the saved transaction. I'm not sure whether it would make sense to offer the diffing technique in this package. See also #81.
In a real app that uses server-side transactions, different clients may concurrently call server methods that each try to perform a transaction. These calls run in different fibers, which can interleave when they perform blocking operations. But there's only one
tx._transaction_id
variable on the entire server, so if the second call starts before the first call finishes, it will try to use the (unrelated) transaction from the first call instead of starting a separate one. I did a simple test and the second call continued to add items to the transaction while the first call was committing it, which corrupted the transaction in such a way that it couldn't be undone. My test hadtx.requireUser = false
, but as far as I can tell, the same would happen even withtx.requireUser = true
and different user IDs.At a minimum, this package should have a separate notion of the "current server-side transaction" for each client, mirroring the behavior of client-side transactions. For apps where the server uses
unblock
to process multiple method calls from the same client in parallel, it might make sense for these to have independent transactions too, though I'm not sure what the analogous functionality for client-side transactions would be.The output of my test: