Open ondra-novak opened 6 years ago
Hello,
We are facing a similar issue as the one described by @ondra-novak on the _users
database.
Let A
and B
be 2 CouchDB servers.
_users
database (if it does not exist yet)
Request
curl -X PUT http://$A:5984/_users
Response
{
"ok": true
}
curl -X PUT http://$A:5984/_users/org.couchdb.user:email@example.com -d '{"name": "email@example.com", "type": "user", "roles": [], "password": "password"}'
Response
{
"ok": true,
"id": "org.couchdb.user:email@example.com",
"rev": "1-19fa61446d76bcb9131a1b65c03b4c1c"
}
_users
from A
to B
Request
curl -X POST http://$A:5984/_replicate \
-d '{"source":"http://$A:5984/_users","target":"http://$B:5984/_users","create_target":true}' \
-H "Content-Type: application/json"
Response
{
"ok": true,
"session_id": "...",
"..."
}
A
Request
curl -X DELETE http://$A:5984/_users/org.couchdb.user:email@example.com?rev=1-19fa61446d76bcb9131a1b65c03b4c1c
Response
{
"ok": true,
"id": "org.couchdb.user:email@example.com",
"rev": "2-43b4189bc429587a529beba79a037cad"
}
_users
from A
to B
once again
Request
curl -X POST http://$A:5984/_replicate \
-d '{"source":"http://$A:5984/_users","target":"http://$B:5984/_users","create_target":true}' \
-H "Content-Type: application/json"
Response
{
"ok": true,
"session_id": "...",
"..."
}
A
Request
curl -X PUT http://$A:5984/_users/org.couchdb.user:email@example.com -d '{"name": "email@example.com", "type": "user", "roles": [], "password": "password"}'
Response
{
"ok": true,
"id": "org.couchdb.user:email@example.com",
"rev": "3-1a2c0e3876f314e08d80b1d236481a2d"
}
_users
from A
to B
Request
curl -X POST http://$A:5984/_replicate \
-d '{"source":"http://$A:5984/_users","target":"http://$B:5984/_users","create_target":true}' \
-H "Content-Type: application/json"
Response
{
"ok": true,
"session_id": "...",
"..."
}
The user document might be replicated from A
to B
and the number of documents on A/_users
and B/_users
might be equal to 2 (_design/_auth
document + our user).
The document is not replicated and the number of documents on A/_users
is equal to 2 but the number of documents on B/_users
is equal to 1 (_design/_auth
only).
A
B
This issue is quite impacting for us as it silently breaks replication and can lead to data loss.
Thanks in advance for helping us on this bug.
Do not hesitate to ask me for more infos if needed.
So I know this was filed a long time ago, but - what a fascinating bug report!
I understand what's happening in @ondra-novak 's initial report, but it's not clear to me that @bravier is experiencing the same issue. In Ondra's case the validation failure happens because the replicator skips over the deletion, which means the target sees the old version (before the deletion) and the new version, and their owner
fields don't match.
In @bravier 's case there's a replication after every edit, so it's definitely a different situation. I'm wondering if in that situation oldDoc
really is present in the replicated edit path, and because the deletion did not keep the name
property in the tombstone it fails to validate when the re-created doc shows up as an extension of the edit path. It's almost the converse of @ondra-novak 's report. Worth double-checking.
At a higher level ... I'm concerned a bit about the reliance on the presence of oldDoc
in the VDU when a document was created without specifying any revision. Yes, in that case CouchDB will look to extend an existing edit branch, but I always thought of this as a performance optimization that was not intended to have user-facing consequences. Clearly it does, and folks are relying on it for reasonable use cases.
Separately, at some point we blocked the ability to supply the _rev
of a deleted leaf edit when every leaf of the document is deleted. It's possible that this happened in 39df1d5e78a3ffd. Oddly enough if some other branch of the document is not deleted we will let the operation against the deleted leaf go through. That seems like an unintentional inconsistency.
I don't have a great answer about how to proceed here, but figured I should write down a few details that I picked up because I happened to be looking at this part of the codebase for #1957 and @wohali pointed me to this ticket as an interesting edge case.
This is an interesting edge case indeed!
I also experience the bug on CouchDB 2. My understanding is that it happens in this particular case:
first rev latest rev on source
↘ ↙
SOURCE SERVER o----o----o
TARGET SERVER o----o----o
↖
latest rev on target
first rev tombstone rev (latest rev on source)
↘ ↙
SOURCE SERVER o----o----o----x
TARGET SERVER o----o----o----x
↖
tombstone rev (latest rev on target)
tombstone rev
first rev (last common ancestor) latest rev on source
↘ ↓ ↙
SOURCE SERVER o----o----o----x----o----o----o----o
TARGET SERVER o----o----o----x
↖
tombstone rev (latest rev on target)
I have got the same issue as well for the database.
Currently, i dont allow to delete but only remarked as "deleted". Haha
It is possible to recreate (i mean: to update a deleted document with new content which hasn't the flag _deleted set to true) a document without need to specify revision field (_rev). However this behaviour has been changed somehow in CouchDB 2+.
In CouchDB 1.X, one can recreate document with the correct revision. It is also possible to recreate document without the revision and in this case, the CouchDB 1.X uses the document's recent revision. The rest of the operation is similar to updating existing document, including how the VDU (validate_doc_update) is called.
Consider following declaration (formatted to achieve better reading)
In normal operation,
oldDoc
contains old document,newDoc
contains new document. If the document is being created, the function is called witholdDoc == null
. For all other updates the function is called witholdDoc
equal to the body of the document being updated.In
CouchDB 1+
this is also applied for deleted documents, when the document is recreated, regardless on whether it is recreated with _rev or without the oldDoc contains the body of the deleted document.In CouchDB 2+ this is no longer true. When the document is being recreated, oldDoc is always null. This behaviour can break replication in situation, when the replication is not continuous and when the document is deleted and recreated between each replication round.
Expected Behavior
If the document is recreated, the oldDoc of VDU should contain the body of previously deleted document
Current Behavior
If the document is recreated, the oldDoc of VDU is set to null
Possible Solution
The recreating of the document is mistakenly handled as a creation of the new document. It should be handled as an update.
Steps to Reproduce (for bugs)
This bug is dangerous in a one way. There is no way to detect, that the document is being recreated so the creator can create a document which suddenly cannot be replicated. There is also no way how to prevent such situation, because the VDU cannot distinguish between creation and recreation
creation:
recreation:
Context
The main goal of the checking the owner in the VDU is to prevent user to steal a document owned by other user.
Your Environment