NethServer / dev

NethServer issue tracker
https://github.com/NethServer/dev/issues
63 stars 20 forks source link

Core update deadlock #6848

Closed DavidePrincipi closed 4 months ago

DavidePrincipi commented 4 months ago

Latest changes to the update-core action introduced a deadlock bug.

Steps to reproduce

Start the update-core action in a system with both core image and core modules to be updated

Expected behavior

Update completes

Actual behavior

{
    "id": "e88003e7-0037-4ed0-8072-04795e0250b9",
    "action": "update-module",
    "data": {
        "module_url": "ghcr.io/nethserver/traefik:2.1.1",
        "instances": [
            "traefik1",
            "traefik3",
            "traefik6",
            "traefik2"
        ],
        "force": false
    },
    "parent": "5b9d68e3-3d5b-4039-92cb-3005fd0808f6",
    "extra": {}
}

Workaround:

To obtain the actual task id, look at cluster/tasks contents

redis-cli lrange cluster/tasks 0 1000

Or check what the blocked process is doing (it's polling every 5 seconds the task result)

strace -s 1024 -p 2780788

Abort the update with

redis-cli del cluster/tasks
redis-cli mset task/cluster/e88003e7-0037-4ed0-8072-04795e0250b9/output '' task/cluster/e88003e7-0037-4ed0-8072-04795e0250b9/error '' task/cluster/e88003e7-0037-4ed0-8072-04795e0250b9/exit_code 1

Components

core 2.5.0-dev.5+ traefik 2.1.1-dev.1+

DavidePrincipi commented 4 months ago

Released in https://github.com/NethServer/ns8-core/releases/tag/2.5.1