FalkorDB / FalkorDB

A super fast Graph Database uses GraphBLAS under the hood for its sparse adjacency matrix graph representation. Our goal is to provide the best Knowledge Graph for LLM (GraphRAG).
https://www.falkordb.com/
Other
672 stars 27 forks source link

All nodes and edges getting deleted after a few hours ‼️ #808

Open osehmathias opened 3 days ago

osehmathias commented 3 days ago

Hi!

So, I'm a big fan of the DB and I have been using it extensively, but in the last few days, I think after 4.3.0, Falkor will randomly lose all of the nodes and edges in the graph.

I have run it every which way, including following the persistence guide from the docs. Still, after my client disconnects and doesn't connect for a couple hours, when I reconnect, there are no nodes.

See the logs from a recent run.

Do you have any idea of what I can do? I am at the point of considering other DBs, but like Falkor and will use it if there is a robust solution.

Thanks.

[ec2-user@ip-10-3-22-58 ~]$ sudo docker logs -f 29e33a9b5ed5
11:C 15 Oct 2024 14:34:05.530 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
11:C 15 Oct 2024 14:34:05.530 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=11, just started
11:C 15 Oct 2024 14:34:05.530 * Configuration loaded
11:M 15 Oct 2024 14:34:05.531 * monotonic clock: POSIX clock_gettime
11:M 15 Oct 2024 14:34:05.531 * Running mode=standalone, port=6379.
11:M 15 Oct 2024 14:34:05.536 * <graph> Enabled role change notification
11:M 15 Oct 2024 14:34:05.537 * <graph> Starting up FalkorDB version 4.3.0.
11:M 15 Oct 2024 14:34:05.558 * <graph> Thread pool created, using 2 threads.
11:M 15 Oct 2024 14:34:05.558 * <graph> Maximum number of OpenMP threads set to 2
11:M 15 Oct 2024 14:34:05.558 * <graph> Query backlog size: 1000
11:M 15 Oct 2024 14:34:05.559 * Module 'graph' loaded from /FalkorDB/bin/src/falkordb.so
11:M 15 Oct 2024 14:34:05.559 * Server initialized
11:M 15 Oct 2024 14:34:05.559 * Ready to accept connections tcp
   ▲ Next.js 14.1.0
   - Local:        http://localhost:3000
   - Network:      http://0.0.0.0:3000

 ✓ Ready in 148ms
11:M 15 Oct 2024 14:39:06.082 * 100 changes in 300 seconds. Saving...
11:M 15 Oct 2024 14:39:06.084 * Background saving started by pid 37
37:C 15 Oct 2024 14:39:06.084 * <graph> Created 0 virtual keys for graph a6128b40-c8e3-42bf-9d20-c5144fc1a64c
37:C 15 Oct 2024 14:39:06.095 * DB saved on disk
37:C 15 Oct 2024 14:39:06.095 * <graph> Deleted 0 virtual keys for graph a6128b40-c8e3-42bf-9d20-c5144fc1a64c
37:C 15 Oct 2024 14:39:06.096 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
11:M 15 Oct 2024 14:39:06.185 * Background saving terminated with success
11:M 15 Oct 2024 18:07:43.576 * <graph> Created 0 virtual keys for graph a6128b40-c8e3-42bf-9d20-c5144fc1a64c
11:M 15 Oct 2024 18:07:43.585 * DB saved on disk
11:M 15 Oct 2024 18:07:43.585 * <graph> Deleted 0 virtual keys for graph a6128b40-c8e3-42bf-9d20-c5144fc1a64c
11:M 15 Oct 2024 18:07:43.754 * DB saved on disk
11:M 15 Oct 2024 18:07:44.333 * DB saved on disk
11:M 15 Oct 2024 18:07:44.501 * DB saved on disk
11:M 15 Oct 2024 18:07:44.586 * DB saved on disk
11:M 15 Oct 2024 18:07:45.167 * DB saved on disk
11:M 15 Oct 2024 18:07:45.417 * DB saved on disk
swilly22 commented 3 days ago

Hi @osehmathias I'm sorry to hear that. When you encounter this situation how do you interact with the DB ? is that via some client e.g FalkorDB-py?

Would you mind trying querying the DB using redis-cli on that occasion, avoiding any mishaps that might becoming for the client ?

Trying to get gather additional information, can you please attach a monitor to the DB and save its logs to disk? You can do this by running redis-cli MONITOR > log.txt assuming you're running FalkorDB locally on your machine, otherwise you'll need to provide its host, port, username and password.

Thank you.

osehmathias commented 3 days ago

Hi @swilly22 Thank you for the reply.

I am encountering this error when interacting with the database via the falkordb-py client as well as falkordb-ts client. Additionally, when I inspect the database via the browser, I can see that my graphs are there, but the node count and edge counts are 0.

Via the falkordb-ts client, I am just running a simple query where I select the graph then fetch all nodes and edges for the preview. I have started doing this recently, around the time I began to notice loss.

I will see if removing the ts client has any influence over this anomaly, will attach a log, and will try the redis-cli. I will report back with findings and relevant logs.

Thanks.

swilly22 commented 2 days ago

Thank you @osehmathias much appreciated!

In addition when you encounter data loss, please provide the output from the command: redis-cli keys * Lastly can you please share how are you running the DB? As a standalone (single master instance) or did you setup either a straightforward primary & replica replication or a full Cluster, are you using Sentinel to monitor your setup ?

osehmathias commented 2 days ago

HI @swilly22

I created a new docker instance with persistence, added nodes to the graph and let it run overnight.

This morning, the first thing I did was check it via browser.falkordb.com, instead of via the preview in my app which uses the falkordb-ts client.

There are no graphs.

Screenshot 2024-10-16 at 08 44 02

Adding more mystery to the problem ....

There is something in the logs about replication, when I am not running any other instances. I am running in standalone mode. No other instances exist in my AWS account, and the only way to connect to this instance outside of the account is to use the public IP, which I have not shared with anyone. I have run straightforward primary and replications before, but not in this configuration. I started this instance as a fresh standalone yesterday after this post.

Regarding how I am running it, I am running it on EC2 t4g.medium instances. I have also tried ECS and Fargate for production hosting, along with the obvious on my local machine.

For this particular instance, I started an EC2 instance with no launch template, SSH'ed in and started the FalkorDB docker container myself.

This may well be an error on my part that I have not caught yet. However, I am happy to leave this for now, as I am going to use another graph DB in production, and use FalkorDB in prototyping until I understand why this is happening.


Running redis-cli keys *

root@05ea8feaf131:/FalkorDB# redis-cli keys *
(error) ERR wrong number of arguments for 'keys' command
root@05ea8feaf131:/FalkorDB# ls

Logs

11:C 15 Oct 2024 20:34:36.755 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
11:C 15 Oct 2024 20:34:36.756 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=11, just started
11:C 15 Oct 2024 20:34:36.756 * Configuration loaded
11:M 15 Oct 2024 20:34:36.756 * monotonic clock: POSIX clock_gettime
11:M 15 Oct 2024 20:34:36.757 * Running mode=standalone, port=6379.
11:M 15 Oct 2024 20:34:36.762 * <graph> Enabled role change notification
11:M 15 Oct 2024 20:34:36.762 * <graph> Starting up FalkorDB version 4.3.0.
11:M 15 Oct 2024 20:34:36.787 * <graph> Thread pool created, using 2 threads.
11:M 15 Oct 2024 20:34:36.787 * <graph> Maximum number of OpenMP threads set to 2
11:M 15 Oct 2024 20:34:36.787 * <graph> Query backlog size: 1000
11:M 15 Oct 2024 20:34:36.787 * Module 'graph' loaded from /FalkorDB/bin/src/falkordb.so
11:M 15 Oct 2024 20:34:36.787 * Server initialized
11:M 15 Oct 2024 20:34:36.788 * Loading RDB produced by version 7.2.4
11:M 15 Oct 2024 20:34:36.788 * RDB age 8811 seconds
11:M 15 Oct 2024 20:34:36.788 * RDB memory usage when created 1.20 Mb
11:M 15 Oct 2024 20:34:36.788 * Done loading RDB, keys loaded: 4, keys expired: 0.
11:M 15 Oct 2024 20:34:36.788 * DB loaded from disk: 0.001 seconds
11:M 15 Oct 2024 20:34:36.788 * Ready to accept connections tcp
   ▲ Next.js 14.1.0
   - Local:        http://localhost:3000
   - Network:      http://0.0.0.0:3000

 ✓ Ready in 132ms
11:M 15 Oct 2024 20:50:14.324 * 100 changes in 300 seconds. Saving...
11:M 15 Oct 2024 20:50:14.325 * Background saving started by pid 52
52:C 15 Oct 2024 20:50:14.325 * <graph> Created 0 virtual keys for graph a58635b7-21ff-43f0-a92b-959012fc7a18
52:C 15 Oct 2024 20:50:14.331 * DB saved on disk
52:C 15 Oct 2024 20:50:14.331 * <graph> Deleted 0 virtual keys for graph a58635b7-21ff-43f0-a92b-959012fc7a18
52:C 15 Oct 2024 20:50:14.332 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
11:M 15 Oct 2024 20:50:14.427 * Background saving terminated with success
11:M 15 Oct 2024 20:55:15.001 * 100 changes in 300 seconds. Saving...
11:M 15 Oct 2024 20:55:15.002 * Background saving started by pid 53
53:C 15 Oct 2024 20:55:15.002 * <graph> Created 0 virtual keys for graph a58635b7-21ff-43f0-a92b-959012fc7a18
53:C 15 Oct 2024 20:55:15.012 * DB saved on disk
53:C 15 Oct 2024 20:55:15.012 * <graph> Deleted 0 virtual keys for graph a58635b7-21ff-43f0-a92b-959012fc7a18
53:C 15 Oct 2024 20:55:15.013 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
11:M 15 Oct 2024 20:55:15.103 * Background saving terminated with success
11:M 16 Oct 2024 03:53:07.103 * 1 changes in 3600 seconds. Saving...
11:M 16 Oct 2024 03:53:07.104 * Background saving started by pid 54
54:C 16 Oct 2024 03:53:07.107 * DB saved on disk
54:C 16 Oct 2024 03:53:07.108 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
11:M 16 Oct 2024 03:53:07.205 * Background saving terminated with success
11:S 16 Oct 2024 03:55:54.139 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
11:S 16 Oct 2024 03:55:54.139 * Connecting to MASTER 91.108.105.178:60139
11:S 16 Oct 2024 03:55:54.139 * MASTER <-> REPLICA sync started
11:S 16 Oct 2024 03:55:54.139 * REPLICAOF 91.108.105.178:60139 enabled (user request from 'id=23952 addr=47.239.11.42:33026 laddr=172.17.0.2:6379 fd=29 name= age=223 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=30 qbuf-free=20444 argv-mem=26 multi-mem=0 rbs=1024 rbp=97 obl=0 oll=0 omem=0 tot-mem=22450 events=r cmd=slaveof user=default redir=-1 resp=2 lib-name= lib-ver=')
11:M 16 Oct 2024 03:56:54.824 * Discarding previously cached master state.
11:M 16 Oct 2024 03:56:54.824 * Setting secondary replication ID to 215b41810de70c143f07417f72e881cfce0b5975, valid up to offset: 1. New replication ID is dee308ccc2a89bceb41d08b03503829cd89a7120
11:M 16 Oct 2024 03:56:54.824 * MASTER MODE enabled (user request from 'id=23952 addr=47.239.11.42:33026 laddr=172.17.0.2:6379 fd=29 name= age=283 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=16 qbuf-free=20458 argv-mem=12 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=22436 events=r cmd=slaveof user=default redir=-1 resp=2 lib-name= lib-ver=')
11:M 16 Oct 2024 04:53:08.068 * 1 changes in 3600 seconds. Saving...
11:M 16 Oct 2024 04:53:08.069 * Background saving started by pid 55
55:C 16 Oct 2024 04:53:08.073 * DB saved on disk
55:C 16 Oct 2024 04:53:08.074 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
11:M 16 Oct 2024 04:53:08.170 * Background saving terminated with success
root@05ea8feaf131:/FalkorDB# cat log.txt
OK
1729024524.457458 [0 10.3.109.54:40434] "CLIENT" "SETINFO" "LIB-NAME" "FalkorDB"
1729024524.458181 [0 10.3.109.54:40434] "CLIENT" "SETINFO" "LIB-VER" "1.0.0"
1729024524.458893 [0 10.3.109.54:40434] "INFO" "server"
1729024524.459752 [0 10.3.109.54:40434] "INFO" "server"
1729024524.462219 [0 10.3.109.54:40438] "CLIENT" "SETINFO" "LIB-NAME" "FalkorDB"
1729024524.462926 [0 10.3.109.54:40438] "CLIENT" "SETINFO" "LIB-VER" "1.0.0"
1729024524.463715 [0 10.3.109.54:40438] "INFO" "server"
1729024524.464564 [0 10.3.109.54:40438] "INFO" "server"
1729024524.466933 [0 10.3.109.54:40448] "CLIENT" "SETINFO" "LIB-NAME" "FalkorDB"
1729024524.467629 [0 10.3.109.54:40448] "CLIENT" "SETINFO" "LIB-VER" "1.0.0"
1729024524.468354 [0 10.3.109.54:40448] "INFO" "server"
1729024524.469180 [0 10.3.109.54:40448] "INFO" "server"
1729024524.471579 [0 10.3.109.54:40452] "CLIENT" "SETINFO" "LIB-NAME" "FalkorDB"
1729024524.472254 [0 10.3.109.54:40452] "CLIENT" "SETINFO" "LIB-VER" "1.0.0"
1729024524.473035 [0 10.3.109.54:40452] "INFO" "server"
1729024524.473853 [0 10.3.109.54:40452] "INFO" "server"
1729024524.476298 [0 10.3.109.54:40468] "CLIENT" "SETINFO" "LIB-NAME" "FalkorDB"
1729024524.476992 [0 10.3.109.54:40468] "CLIENT" "SETINFO" "LIB-VER" "1.0.0"
1729024524.477718 [0 10.3.109.54:40468] "INFO" "server"
1729024524.478581 [0 10.3.109.54:40468] "INFO" "server"
1729024524.480890 [0 10.3.109.54:40476] "CLIENT" "SETINFO" "LIB-NAME" "FalkorDB"
1729024524.481585 [0 10.3.109.54:40476] "CLIENT" "SETINFO" "LIB-VER" "1.0.0"
1729024524.482391 [0 10.3.109.54:40476] "INFO" "server"
1729024524.483265 [0 10.3.109.54:40476] "INFO" "server"
1729024524.486336 [0 10.3.109.54:40490] "CLIENT" "SETINFO" "LIB-NAME" "FalkorDB"
1729024524.487012 [0 10.3.109.54:40490] "CLIENT" "SETINFO" "LIB-VER" "1.0.0"
1729024524.487737 [0 10.3.109.54:40490] "INFO" "server"
1729024524.488532 [0 10.3.109.54:40490] "INFO" "server"
1729024524.490685 [0 10.3.109.54:40496] "CLIENT" "SETINFO" "LIB-NAME" "FalkorDB"
1729024524.491376 [0 10.3.109.54:40496] "CLIENT" "SETINFO" "LIB-VER" "1.0.0"
1729024524.492071 [0 10.3.109.54:40496] "INFO" "server"
1729024524.492895 [0 10.3.109.54:40496] "INFO" "server"
1729024524.495119 [0 10.3.109.54:40504] "CLIENT" "SETINFO" "LIB-NAME" "FalkorDB"
1729024524.495878 [0 10.3.109.54:40504] "CLIENT" "SETINFO" "LIB-VER" "1.0.0"
1729024524.496640 [0 10.3.109.54:40504] "INFO" "server"
1729024524.497449 [0 10.3.109.54:40504] "INFO" "server"
1729024524.499671 [0 10.3.109.54:40520] "CLIENT" "SETINFO" "LIB-NAME" "FalkorDB"
1729024524.500419 [0 10.3.109.54:40520] "CLIENT" "SETINFO" "LIB-VER" "1.0.0"
1729024524.501085 [0 10.3.109.54:40520] "INFO" "server"
1729024524.501873 [0 10.3.109.54:40520] "INFO" "s
gkorland commented 1 day ago

@osehmathias to get a quick resolution, can you please get on our FalkorDB discord? https://discord.com/invite/6M4QwDXn2w

swilly22 commented 1 day ago

@osehmathias The logs indicate that your master had turned into a replica. Given that you're running in stand-alone mode, which if I understand correctly is just a single master.

Because your FalkorDB instance is exposed to the public via its public IP address, could you please let me know if you've setup a username and a password in order to connect to the FalkorDB server?

We've seen cases where malicious users scanning AWS public IP ranges in search for exposed servers, once found the servers are exploited.

osehmathias commented 1 day ago

Yes, just a single master. No auth.

I am going to run it next with no public IP and auth set up, though I would be surprised (and deeply troubled!!) if a malicious actor was the case, as this disappearing issue has happened consistently about 10 times in the past week with new instances on different IPs.

Have also crossposted in Discord. Let me know if you prefer to pick the issue up there as I will continue to post any findings in search of a resolution.

swilly22 commented 1 day ago

I don't mind continuing here. This scenario is quite common, malicious actors are constantly on the look out for vulnerable servers. I believe this is what we're dealing with here.

Try avoiding exposing your server to the public, always setup a username and a password. Do let us know if that resolved the issue.

swilly22 commented 14 hours ago

Hi @osehmathias, Any updates? Did securing your server resolved the issue? Thanks!

osehmathias commented 14 hours ago

Hi Roi, thanks for following up. I have some testing tasks scheduled for this weekend to try to get to the bottom of it. I will update here with results. Thanks.

YaphetKG commented 13 hours ago

Maybe this is related but i was loading a sizable graph and one thing i noticed when using the edge falkor docker container, it took awefully long when trying to load took about 18hrs before i decided to kill it. The redis server was not responsive at that point i rolled back to v4.0.10 of the falkor docker image, same operation took about 30 mins to load.

if this is useful I am using an older version of the bulk loader (haven't had a chance to try the falkor-bulk loader fork). but it seems like its just hanging there.