Open arifcse019 opened 6 years ago
@arifcse019 Can you provide a minimal example for us with a script that uses cURL? I've personally performed the steps you mention above (1-4) many times and have never run across this.
What technique are you using to move the shards? There is newly updated documentation on the approved approach online, can you follow that?
@wohali I am using the following two ruby scripts to move shards: first to update cluster metadata to add the shards to the new node, second one to update cluster metadata to stop looking for those shards in the old one.
https://gist.github.com/arifcse019/43a638e4ce837b029d62d59fd0b9a20f (move_shards_in.rb) https://gist.github.com/arifcse019/c8a7096275e16d344f6c53ad884716ea (move_shards_out.rb)
And the steps are as I described in my issue. These two scripts are run as part of step 2
Hello, I'm facing the same issue on CouchDB 3.1:
[error] 2020-07-30T10:47:23.718632Z couchdb@10.133.136.126 <0.23319.995> -------- Bad security object in <<"_users">>: [{{[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},8},{{[{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},8}]
This is the path I followed:
The shards correctly appeared on the other node but the security object got lost and the error started to appear in the log.
I went in the db with Fauxton and the permission were back to basic _admin/_admin.
I modified them and the error has now gone away.
So my guess is that there's something missing in the _sync_shards code when it comes to copying database permissions.
Same problem here.
couchdb-3.1.1-1.el7.x86_64 running on CentOS 7
More info about the error message. Normal database security should look like
[root@esrp-0a ~]# curl -s http://login:pass@127.0.0.1:5984/db_name/_security| jq
{
"members": {
"roles": [
"_admin"
]
},
"admins": {
"roles": [
"_admin"
]
}
}
For database where error present the same command produce
[root@esrp-0a ~]# curl -s http://login:pass@127.0.0.1:5984/db_name/_security| jq
{}
To display the status of security objects on my server I have created a script. Required to edit login and pass in the script.
#!/bin/sh
db_url=http://login:pass@127.0.0.1:5984
fix_db=false
escape_dbname() {
local DBNAME=$1
echo $DBNAME | sed -e 's:/:%2f:g' -e 's:\+:%2B:'
}
security_json() {
cat << EOF
{"members":{"roles":["_admin"]},"admins":{"roles":["_admin"]}}
EOF
}
get_db_list() {
curl -s ${db_url}/_all_dbs | jq -r '.[]'
}
check_db_security() {
local dbname=$1
local esc_dbname=$(escape_dbname ${dbname})
curl -s ${db_url}/${esc_dbname}/_security | jq 'if . == {} then false else true end'
}
maybe_fix_db_security() {
local dbname=$1
local esc_dbname=$(escape_dbname ${dbname})
if [ "${fix_db}" == "false" ]; then
echo "need to fix database: ${dbname}"
return
fi
echo "fixing database: ${dbname}"
security_json | curl -X PUT -H 'content-type: application/json' -H 'accept: application/json' -d@- -s ${db_url}/${esc_dbname}/_security
}
for i in $(get_db_list)
do
sec_status=$(check_db_security $i)
if [ "${sec_status}" == "false" ]; then
maybe_fix_db_security $i
fi
done
To fix security objects need to set "fix_db" variable to "true" value.
Security objects for some databases fail to sync properly in a new node after all shards are moved from an old node.
Expected Behavior
Security objects for all databases should sync when shards are moved to a new node
Current Behavior
Security objects for some databases fail to sync properly in a new node after all shards are moved from an old node. The log says things like:
" [error] 2018-09-19T17:24:05.388202Z couchdb@first.couchcluster.internal <0.19944.2> -------- Bad security object in <<"db-name">>: [{{[{<<"_id">>,<<"_security">>},{<<"admins">>,{[{<<"names">>,[]},{<<"roles">>,[]}]}},{<<"members">>,{[{<<"names">>,[<<"user-name">>]},{<<"roles">>,[]}]}}]},13},{{[]},7}] "
Steps to Reproduce (for bugs)
Context
We are trying to replace couch cluster instances with new instances as part of preparing for a scenario where one instance can go away abruptly
Your Environment