apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.23k stars 1.03k forks source link

Bad Security Object Error After Moving Shards #1611

Open arifcse019 opened 6 years ago

arifcse019 commented 6 years ago

Security objects for some databases fail to sync properly in a new node after all shards are moved from an old node.

Expected Behavior

Security objects for all databases should sync when shards are moved to a new node

Current Behavior

Security objects for some databases fail to sync properly in a new node after all shards are moved from an old node. The log says things like:

" [error] 2018-09-19T17:24:05.388202Z couchdb@first.couchcluster.internal <0.19944.2> -------- Bad security object in <<"db-name">>: [{{[{<<"_id">>,<<"_security">>},{<<"admins">>,{[{<<"names">>,[]},{<<"roles">>,[]}]}},{<<"members">>,{[{<<"names">>,[<<"user-name">>]},{<<"roles">>,[]}]}}]},13},{{[]},7}] "

Steps to Reproduce (for bugs)

  1. Add a new node to the cluster
  2. Move All Shards from an old node to this new one
  3. Shut down and delete the old node
  4. Verify Security Objects on all databases

Context

We are trying to replace couch cluster instances with new instances as part of preparing for a scenario where one instance can go away abruptly

Your Environment

wohali commented 6 years ago

@arifcse019 Can you provide a minimal example for us with a script that uses cURL? I've personally performed the steps you mention above (1-4) many times and have never run across this.

What technique are you using to move the shards? There is newly updated documentation on the approved approach online, can you follow that?

http://docs.couchdb.org/en/stable/cluster/sharding.html

arifcse019 commented 6 years ago

@wohali I am using the following two ruby scripts to move shards: first to update cluster metadata to add the shards to the new node, second one to update cluster metadata to stop looking for those shards in the old one.

https://gist.github.com/arifcse019/43a638e4ce837b029d62d59fd0b9a20f (move_shards_in.rb) https://gist.github.com/arifcse019/c8a7096275e16d344f6c53ad884716ea (move_shards_out.rb)

And the steps are as I described in my issue. These two scripts are run as part of step 2

skeyby commented 4 years ago

Hello, I'm facing the same issue on CouchDB 3.1:

[error] 2020-07-30T10:47:23.718632Z couchdb@10.133.136.126 <0.23319.995> -------- Bad security object in <<"_users">>: [{{[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},8},{{[{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},8}]

This is the path I followed:

The shards correctly appeared on the other node but the security object got lost and the error started to appear in the log.

I went in the db with Fauxton and the permission were back to basic _admin/_admin.

I modified them and the error has now gone away.

So my guess is that there's something missing in the _sync_shards code when it comes to copying database permissions.

kripper commented 4 years ago

Same problem here.

Steps to reproduce:

Current Behavior:

Expected Behaviour:

Version:

couchdb-3.1.1-1.el7.x86_64 running on CentOS 7

sergey-safarov commented 7 months ago

More info about the error message. Normal database security should look like

[root@esrp-0a ~]# curl -s http://login:pass@127.0.0.1:5984/db_name/_security| jq
{
  "members": {
    "roles": [
      "_admin"
    ]
  },
  "admins": {
    "roles": [
      "_admin"
    ]
  }
}

For database where error present the same command produce

[root@esrp-0a ~]# curl -s http://login:pass@127.0.0.1:5984/db_name/_security| jq
{}
sergey-safarov commented 7 months ago

To display the status of security objects on my server I have created a script. Required to edit login and pass in the script.

#!/bin/sh

db_url=http://login:pass@127.0.0.1:5984
fix_db=false

escape_dbname() {
    local DBNAME=$1
    echo $DBNAME | sed -e 's:/:%2f:g' -e 's:\+:%2B:'
}

security_json() {
cat << EOF
{"members":{"roles":["_admin"]},"admins":{"roles":["_admin"]}}
EOF
}

get_db_list() {
curl -s ${db_url}/_all_dbs | jq -r '.[]'
}

check_db_security() {
    local dbname=$1
    local esc_dbname=$(escape_dbname ${dbname})
    curl -s ${db_url}/${esc_dbname}/_security | jq 'if . == {} then false else true end'
}

maybe_fix_db_security() {
    local dbname=$1
    local esc_dbname=$(escape_dbname ${dbname})
    if [ "${fix_db}" == "false" ]; then
        echo "need to fix database: ${dbname}"
        return
    fi
    echo "fixing database: ${dbname}"
    security_json | curl -X PUT -H 'content-type: application/json' -H 'accept: application/json' -d@- -s ${db_url}/${esc_dbname}/_security
}

for i in $(get_db_list)
do
    sec_status=$(check_db_security $i)
    if [ "${sec_status}" == "false" ]; then
        maybe_fix_db_security $i
    fi
done

To fix security objects need to set "fix_db" variable to "true" value.