Open hakanw opened 13 years ago
A patch is fine, but what's wrong with just using the http API directly? I mean there is nothing forcing you to use my save wrapper. I never intended the abstractions I wrote to be the only way to access the API and I purposely didn't allow for every use case through them. You should be able to use the lower level request helper provided.
But still, if you can think of a good API and implementation to make saves safer, then I'll gladly accept a patch.
I totally agree that this library is not the end-all be all; it's a great thin wrapper. My intention here is to improve its default behavior to be better and safer for all. :)
How about changing save() to be safe by default (giving an error on overwriting without _rev) and providing an optional boolean for explicitly overriding it (effectively making sure the potential consequences are understood)?
Changing the signature to: save(doc, [overwriteExisting=false], callback).
Sounds good to me if you're willing to write the patch.
Actually though, I thought it was safe already. Doesn't it only request a new revision if there isn't one in the local cache? This means if someone changes the value behind your back, it will still result in a conflict and not override it? (Honestly, it's been a while since I used this code, so I don't remember very well)
That's true. It protects unless a user simply omits to send the _rev for some reason in which case it assumes it means the "latest version". As a result there's a race condition with different users trying to create a new document (hence no previous _rev) that will overwrite each other — whoever is last wins.
So it's actually pretty strict already, except for the "no _rev" / "new document" case. Maybe we should just make it consistent and not assume "overwrite latest seen revision" when there's no _rev? I.e. remove the revCache.
Well, the thing is, I also want to optimize for the case where you know you're the only client to the database. In that case revisions only get in the way and it's trivial for my library to handle them for you. For the cases where you have concurrent access, it's still safe because others don't have access to your rev cache and you don't have access to theirs. What I don't want my library to do is to try to update, get a conflict, and then simply grab a new revision and save again. That would be bad. But trying to save with the last known local revision should be safe. That's what any client is going to do anyway on their own.
Let's say you're writing a wiki, and there's a bunch of people who want to create a new page by the same name. It doesn't exist when they check, and they all start writing on the initial revision. In this case there's no _rev so anyone saving will overwrite the other. Whoever saves last overwrites everyone before them.
This case makes me still want a boolean to disable the revCache assumption, making this stuff safe in all cases, also the first revision.
I guess if you want to optimize for the "no _rev" case then the boolean could be true by default for unchanged behavior. How does that sound?
So what I'm proposing is: save(doc, [overwriteExisting=true,] callback); to avoid the ambiguity between explicit no _rev (first revision) and "please overwrite anything that is already there".
I see the case of a new write is dangerous. Could you patch the code to only use the cached revision if both the user's session has already done one insert and has a rev in the cache and the user doesn't supply an explicit rev. I'd rather not have a flag in the arguments if at all possible. It should never overwrite existing documents if there is a conflict. The automatic rev should only work for safe writes where you were the last author.
That sounds like a reasonable heuristic. However, how do you get the current user's session (I assume that would be remoteAddress?) when the save method doesn't get the connection supplied to it? :(
By session, I meant your local node process. You can always assume you're in the same session. But I guess if several users are using the same node server, then you would need to break it out into actual user sessions.
By safe, I meant that one node process (client to couch) couldn't overwrite another client's data on accident.
If you have several users within a single node process (probably quite common with web servers) then you would need to make sure different users don't nuke each others data.
I'm not sure that level of abstraction is something I want to bake into the library. It (the couch client) shouldn't have to know about it's own (the http server in process) client's sessions.
I don't know what kind of applications you write, but most that I've ever dealt with involves more than one physical user using a web application. :-)
Given that's probably what 90% do with this library or node at all, I think it's pretty weird not to take care of that case in the library. But it's your call, feel free to close the ticket then. Thanks anyway.
Even if all my programs were multi-user servers connected to a single couch database, I still would like clean separation between my db client and my application logic. While I think there is very little value in OOP ideas like inheritance, I very much believe in separation of concerns. Otherwise code becomes unmaintainable real fast.
A simple solution would be to check all inserts as they leave the application later and require a _rev. Maybe a flag to disable the rev cache would be a clean way to enforce this.
Also, if there isn't too much overhead, you could have a separate couch client instance per user session. But that may be too expensive.
I'd be fine simply removing the rev cache completely from the API. I prefer that to a flag. It's not hard at all to implement externally if needed.
"I'd be fine simply removing the rev cache completely from the API. I prefer that to a flag. It's not hard at all to implement externally if needed."
Sorry, I'm confused now, what does that sentence mean in practive? :) You'd accept the chance removing the rev cache from the lib, enabling it with a global boolean flag?
After discussing with you, I see the best solution that will fit both of our constraints would be to simple remove the rev cache completely from my driver. I'll accept a patch for that as long as the package.json has a major version change.
I've just pushed this to my fork. I'm happy for feedback.
I wasn't sure how to best handle the error; it will hand an Error() instance to the callback along with an empty document with the _id of the document that conflicted.
https://github.com/hakanw/couch-client/commit/60a1a02d0925f3bd0106befe099d7e8465eeab8b
Did a bunch of updates today that fixes this, among other things.
Have a look: https://github.com/michael/couch-client/commit/011698b10a5d21ab43d86aa277884f92b48e24ed
The current behavior of save() in couch-client is risky. The couchdb way is to require a _rev property to save over a previous version, to avoid conflicts, but couch-client "automagically" retrieves the current version and simply overwrites it.
This is not what I expected and requires the client to do a separate HEAD/GET lookup just to find out whether a save is safe. Actually, between that get() and the new save(), the doc could have been updated, so save() is never safe to use in couch-client. :-(
I think save() should be changed to never overwrite a previous document, unless a _rev is specified, or some kind of parameter option for this should be provided. Unfortunately this requires me to fork couch-client just to get a safe save(), which is a necessity in most non-toy applications.
I'm willing to help provide a patch for any of these options.