falcosecurity / falcosidekick-ui

A simple WebUI with latest events from Falco
Apache License 2.0
112 stars 16 forks source link

Events disappear after a few hours #146

Open alternativc opened 5 months ago

alternativc commented 5 months ago

Describe the bug

This is a simple setup: falco(systemd) -> falcosidekick(docker) -> falcosidekick-ui(docker) + redis(docker). We run falco on all machines while the sidekick/ui/redis stack lives inside a docker swarm stack on the same host.

Now this setup works, we can see events in the UI, however after a certain time interval the events disappear.

These are the logs from facosidekick-ui container (debug level logs):

2024/06/04 12:28:51  NEW event 'event:b0ad942b-b7c1-4272-92ea-977419630c1d'
2024/06/04 12:28:51  NEW event 'event:f64960df-271c-4562-9c15-2e2ce3e78a07'
2024/06/04 12:28:51  NEW event 'event:464bd05d-24bd-4e62-84ce-aa68a7debdd3'
2024/06/04 12:30:25 [ERROR]: [0] Unknown index name

2024/06/04 12:30:25 [ERROR]: [0] Unknown index name

2024/06/04 12:30:25 [ERROR]: [0] Unknown index name
...
2024/06/05 07:57:01 [INFO] : user 'admin' authenticated
2024/06/05 07:57:01  GET count by priority (source='', priority='', rule='', since='', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET count by rule (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET count by priority (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET count by priority (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET count by rule (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET count by source (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET count by priority (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET count by source (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET count by tags (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET count by hostname (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET count by rule (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01  GET search (source='', priority='', rule='', since='24h', hostname='', filter='', tags='', page='0', limit='500')
2024/06/05 07:57:01  GET search (source='', priority='', rule='', since='24h', hostname='', filter='', tags='', page='0', limit='500')
2024/06/05 07:57:01  GET count by tags (source='', priority='', rule='', since='24h', hostname='', filter='', tags='')
2024/06/05 07:57:01 [ERROR]: eventIndex: no such index
2024/06/05 07:57:01 [ERROR]: eventIndex: no such index

These are the redis logs (from the time the errors started:

...
9:M 04 Jun 2024 12:24:10.771 * Background saving started by pid 48
48:C 04 Jun 2024 12:24:10.774 * DB saved on disk
48:C 04 Jun 2024 12:24:10.774 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
9:M 04 Jun 2024 12:24:10.871 * Background saving terminated with success
9:M 04 Jun 2024 12:30:01.034 * DB saved on disk
9:M 04 Jun 2024 12:30:01.485 * <redisgears_2> Got a flush started event
9:M 04 Jun 2024 12:30:01.486 * DB saved on disk
9:M 04 Jun 2024 12:30:03.020 * DB saved on disk
9:M 04 Jun 2024 12:30:03.459 * DB saved on disk
9:M 04 Jun 2024 12:30:03.671 * <redisgears_2> Got a flush started event
9:M 04 Jun 2024 12:30:03.672 * DB saved on disk
9:M 04 Jun 2024 12:30:05.209 * DB saved on disk
9:M 04 Jun 2024 12:30:05.849 * DB saved on disk
9:M 04 Jun 2024 14:49:55.750 * <redisgears_2> Got a flush started event
...

Now I think something happens in the redis container to invalidate the index. If I restart falcosidekick-ui container then the events appear again.

I have tried manipulating the since parameter, with the same result.

How to reproduce it

Run the following docker-compose stack, emit some test events and wait. Please note that this is not a production ready stack, deploy section omitted:

version: "3.8"
services:
  falco-sidekick:
    image: falcosecurity/falcosidekick:latest
    ports:
      - "2801:2801"
    networks:
      - falco
    environment:
      - WEBUI_URL=http://falco-sidekick-ui:2802
  redis:
    image: redis/redis-stack:latest
    ports:
      - "6379:6379"
    networks:
      - falco

  falco-sidekick-ui:
    image: falcosecurity/falcosidekick-ui:latest
    environment:
      - FALCOSIDEKICK_UI_REDIS_URL=redis:6379
      - FALCOSIDEKICK_UI_LOGLEVEL=debug
    ports:
      - "2802:2802"
    networks:
      - falco
      - caddy    

networks:
  caddy:
    external: true
  falco:

Expected behaviour

Falco events persist longer than X hours, or with TTL definition.

Screenshots

After X hours:

image

After UI container restart:

image

Environment

Additional context n/a

Issif commented 5 months ago

Hi, Do you have any idea about the duration before the issue occurs? I'm getting more and more issues with the redis backend, it's in my to-do to replace it with something else, but no ETA for now.

alternativc commented 5 months ago

It's fairly non-deterministic but somewhere between 4<->12h. I was hoping that DEBUG level logs would give more info as to what is actually being searched for so I could inspect what is happening in both containers. Let me know if I can help in anyway

Issif commented 5 months ago

I'll do some tests on my side too, redis is the root cause for sure, just don't know how.

alternativc commented 5 months ago

On my end: I've added a volume mount to the redis container, for persistance (if that was the cause?). I'll update the ticket with those findings if they will be relevant.

judikag03 commented 3 months ago

you can write root cause this problem, i have same issue.