clicon / clixon-controller

Clixon network controller
Apache License 2.0
12 stars 4 forks source link

Scaling, large amounts of configuration may result in lock issues #38

Closed krihal closed 9 months ago

krihal commented 9 months ago

Test: Configure 1000 users on the OpenConfig containers.

Result: Possible to configure ~140 users before getting stuck with lock-denied error.

Sep 13 13:58:59: Editing configuration: protocol lock-denied Operation failed, lock is already held <session-id>1677721</session-id>

The last debug seen on the backend:

`` Sep 13 14:05:29: Recv [1112]: Sep 13 14:05:29: from_client_msg module:ietf-netconf rpc:get-config ce_id:1112 s:5 Sep 13 14:05:29: Send [1112]: Sep 13 14:05:29: Recv [1112]: nonetest14test14_4test14_4 Sep 13 14:05:29: from_client_msg module:ietf-netconf rpc:edit-config ce_id:1112 s:5 Sep 13 14:05:29: from_client_edit_config done cbret:protocollock-denied1677721errorOperation failed, lock is already held Sep 13 14:05:29: Send [1112]: protocollock-denied1677721errorOperation failed, lock is already held Sep 13 14:05:29: Recv [1112]: Sep 13 14:05:29: from_client_msg module:ietf-netconf rpc:close-session ce_id:1112 s:5 Sep 13 14:05:29: Send [1112]: Sep 13 14:05:29: backend_client_rm Sep 13 14:05:29: Recv [1114]: Sep 13 14:05:29: from_client_msg Warning: incoming session-id:1112 does not match ce_id:1114 on socket: 5 Sep 13 14:05:29: from_client_msg module:ietf-netconf rpc:close-session ce_id:1114 s:5 Sep 13 14:05:29: Send [1114]: Sep 13 14:05:29: backend_client_rm Sep 13 14:05:29: stream_ss_rm Sep 13 14:05:29: ce_event_cb op:1 Sep 13 14:05:29: backend_client_rm Sep 13 14:05:29: stream_ss_rm retval: 0 Sep 13 14:05:29: backend_client_rm

krihal commented 9 months ago

To reproduce:

Run the test test-python-service.sh and make sure the test DON'T kill the backend. With the backend still running, execute this script:

#!/bin/sh

users=0
for j in `seq 100`; do
    for i in `seq 10`; do
    clixon_cli -f /var/tmp/test-python-service.sh/controller.xml -1 -m configure set services ssh-users test${j} username test${j}_${i} ssh-key test${j}_${i}
    clixon_cli -f /var/tmp/test-python-service.sh/controller.xml -1 -m configure set services ssh-users test${j} username test${j}_${i} role test${j}_${i}
    users=$((users+1))
    done

    clixon_cli -f /var/tmp/test-python-service.sh/controller.xml -1 -m configure commit
    echo "Configured $users users"
done

The script will run for a while and get stuck after ~140 added users, wait ~5 minutes and it will spew out errors.

debian@khn-dev:~$ ./users.sh
OK
Configured 10 users
OK
Configured 20 users
OK
Configured 30 users
OK
Configured 40 users
OK
Configured 50 users
OK
Configured 60 users
OK
Configured 70 users
OK
Configured 80 users
OK
Configured 90 users
OK
Configured 100 users
OK
Configured 110 users
Transaction 17 failed Timeout waiting for action daemon
Configured 120 users
Sep 13 14:05:26: Editing configuration: protocol lock-denied Operation failed, lock is already held <session-id>1677721</session-id>
krihal commented 9 months ago

Framing issue.