crowdsecurity / cs-cloudflare-worker-bouncer

A CrowdSec Bouncer that syncs the decisions made by CrowdSec with CloudFlare's firewall using cloudflare workers. Manages multi user, multi account, multi zone setup. Supports IP, Country and AS scoped decisions.
https://doc.crowdsec.net/docs/next/bouncers/cloudflare-workers
MIT License
3 stars 6 forks source link

Unable To Process Deleted Decisions #46

Closed delgado23 closed 2 months ago

delgado23 commented 2 months ago

I started seeing this error in my logs a couple of days ago.

 level=error msg="account redacted, unable to process deleted decisions: bulk remove keys: 'namespace not found' (10013)"
 level=error msg="The internal cache of the bouncer is now likely out of sync, and likely needs a restart"
 level=error msg="If this error persists, please open an issue on https://github.com/crowdsecurity/cs-cloudflare-worker-bouncer/issues"

When I restart the bouncer it starts working again for a about a day or so then it starts throwing this error message again.

blotus commented 2 months ago

Hello,

Do you have another script or program running simultaneously (even another instance of the bouncer) that would perform operations on the KV store? The error makes me think the KV namespace somehow got deleted (the bouncer will only try to delete it on shutdown or startup to clean up potential leftover resources).

If you go to the cloudflare console when this error occurs, do you see the CROWDSECCFBOUNCERNS namespace ?

delgado23 commented 2 months ago

@blotus Yes I have two servers protecting different sites. Next time this happens I can see if the CROWDSECCFBOUNCERNS namespace is gone or not. Currently, it is there because I restarted the services.

blotus commented 2 months ago

Are those two servers interacting with the same Cloudflare account ? If so, that's probably the issue. I don't think we ever tested this configuration, but I would be very surprised if it worked.

The name of the KV store is currently hardcoded in the bouncer, meaning that when a new instance starts, it will see the existing namespace and try to delete it.

You can test this by just restarting one of the two bouncers, and the other one should start to misbehave the next time it tries to delete a decision.

delgado23 commented 2 months ago

@blotus It looks like it started happening again on one server and not the other one yet. These are indeed both on the same Cloudflare account.

server 1

 systemctl status crowdsec-cloudflare-worker-bouncer.service 
● crowdsec-cloudflare-worker-bouncer.service - CrowdSec bouncer for Cloudflare
     Loaded: loaded (/usr/lib/systemd/system/crowdsec-cloudflare-worker-bouncer.service; enabled; preset: disabled)
     Active: active (running) since Wed 2024-09-18 09:28:16 EDT; 6h ago
    Process: 806863 ExecStartPre=/usr/bin/crowdsec-cloudflare-worker-bouncer -c /etc/crowdsec/bouncers/crowdsec-cloudflare-worker-bouncer.yaml -t (code=exited, status=0/SUCCESS)
   Main PID: 806874 (crowdsec-cloudf)
      Tasks: 14 (limit: 74593)
     Memory: 65.7M
        CPU: 1min 1.097s
     CGroup: /system.slice/crowdsec-cloudflare-worker-bouncer.service
             └─806874 /usr/bin/crowdsec-cloudflare-worker-bouncer -c /etc/crowdsec/bouncers/crowdsec-cloudflare-worker-bouncer.yaml

Sep 18 09:28:16 overwatch.garaventaville.com systemd[1]: Starting CrowdSec bouncer for Cloudflare...
Sep 18 09:28:16 overwatch.garaventaville.com systemd[1]: Started CrowdSec bouncer for Cloudflare.
level=error msg="account , unable to process deleted decisions: bulk remove keys: 'namespace not found' (10013)"
level=error msg="The internal cache of the bouncer is now likely out of sync, and likely needs a restart"
level=error msg="If this error persists, please open an issue on https://github.com/crowdsecurity/cs-cloudflare-worker-bouncer/issues"

Server 2

 systemctl status crowdsec-cloudflare-worker-bouncer.service 
● crowdsec-cloudflare-worker-bouncer.service - CrowdSec bouncer for Cloudflare
     Loaded: loaded (/usr/lib/systemd/system/crowdsec-cloudflare-worker-bouncer.service; enabled; preset: disabled)
     Active: active (running) since Wed 2024-09-18 12:20:53 EDT; 3h 24min ago
    Process: 2624 ExecStartPre=/usr/bin/crowdsec-cloudflare-worker-bouncer -c /etc/crowdsec/bouncers/crowdsec-cloudflare-worker-bouncer.yaml -t (code=exited, status=0/SUCCESS)
   Main PID: 2639 (crowdsec-cloudf)
      Tasks: 10 (limit: 23106)
     Memory: 76.2M
        CPU: 44.371s
     CGroup: /system.slice/crowdsec-cloudflare-worker-bouncer.service
             └─2639 /usr/bin/crowdsec-cloudflare-worker-bouncer -c /etc/crowdsec/bouncers/crowdsec-cloudflare-worker-bouncer.yaml
time="2024-09-18T15:25:37-04:00" level=info msg="Received 150 deleted decisions"
time="2024-09-18T15:25:37-04:00" level=info msg="Deleting 150 decisions" account=
time="2024-09-18T15:25:37-04:00" level=info msg="Deleted 150 decisions" account=

I still see the CROWDSECCFBOUNCERNS KV as seen in the attached screenshot. Screenshot 2024-09-18 155016

LaurenceJJones commented 2 months ago

This will be unsupported to have 2 cloudflare workers interacting with the same account, you can already define multiple zones within one configuration file (hence 2 should be not needed). As pointed out by @blotus since the second one will see the same name property it will delete and publish a new namespace which will alter the generated ID. When the first one uses the old ID property it has in memory to update the properties it will hit an issue that the namespace doesnt exist anymore.

The fix is to use one remediation and define multiple zones within the same configuration file.

delgado23 commented 2 months ago

Thank you @LaurenceJJones I will give that a try.