Closed fernandomacho closed 1 year ago
Thanks for reporting this. Can you please go to replica url: http://replica:6379/profilez?profile=on
this will start profiling the server, you should see in the INFO log the path to .prof
file. Then, after 30s change the url to http://replica:6379/profilez?profile=off
to flush the profiler file. We will need this file to see what the bottlenecks are.
if you install pprof
on your replica machine before starting the profiling, the browser will render the profiler graph that is also useful.
Hi I try to access this URL, on my case 172.16.0.25:6392/profilez?profile=on over browser and get beautiful image ;)
But when go to 172.16.0.25:6392/profilez?profile=off redirect to http://172.16.0.25:6392/filez?file=/tmp/profile/dragonfly_24102023_183917.prof.svg and get this error
On /tmp directory not exist the file...
Do you kown on that ubuntu package is pprof ?
where dragonfly logs are located? did you install dragonfly from a debian package?
The logs is on /var/log/dragonfly this directory has nothing but dragonfly's log files. If you need me to install pprof, I can't find the ubuntu package where that "utility" is located
can you check if /tmp/
has profile
directory and whether it has anything?
Yes I check /tmp directory and not contain profile directory I test profile on other replica and the result is the same no profile directory on /tmp
I show on logs:
`I20231024 20:58:14.724368 2454603 profilez_handler.cc:53] Starting profiling into /tmp/profile/dragonfly_24102023_185814.prof 1
I20231024 20:58:48.925976 2456502 profilez_handler.cc:70] Symbols /usr/bin/dragonfly.debug, suffix: _24102023_185814.prof
I20231024 20:58:48.926085 2456502 profilez_handler.cc:84] Running command: nice -n 15 pprof -noinlines -lines -unit ms --svg /usr/bin/dragonfly /tmp/profile/dragonfly_24102023_185814.prof > /tmp/profile/dragonfly_24102023_185814.prof.svg 2> /tmp/profile/dragonfly_24102023_185814.prof.err
I20231024 20:58:48.929224 2456502 profilez_handler.cc:95] Redirecting to filez?file=/tmp/profile/dragonfly_24102023_185814.prof.svg
I20231024 21:00:18.047973 4010978 save_stages_controller.cc:245] Saving "/var/lib/dragonfly/dump-2023-10-24T21:00:10-summary.dfs" finished after 7.71 s`
But /tmp/profile not exits
I try to create manually /tmp/profile and change permissions to 777 an re run the profile. The result is the same nothing write on /tmp/profile
I install pprof go command ;)
When run profile show on log
I20231024 21:19:56.859346 2576396 profilez_handler.cc:53] Starting profiling into /tmp/profile/dragonfly_24102023_191956.prof 1
but the file is not created
I think the directory that you are checking as /tmp
is not the directory that is /tmp
for Dragonfly. In your example, you navigated to the svg file which gave an error that the file was empty (and thus an invalid SVG). This happens when pprof is not installed, because the command pipeline will create an empty file on error. Therefore /tmp/profile... exists. You still did not mention how you are running Dragonfly. Is it perhaps within a container?
nop, I run dragonfly on bare metal
our systemd has private tmp folder. please do the following:
dfpid=$(pidof dragonfly)
sudo nsenter -t $dfpid -m ls /tmp
ok, I show the files, how copy this files?
ls /proc/$(pidof dragonfly)/root/tmp/profile
and then you can copy from the private tmp namespace to the regular /tmp dir
cp /proc/$(pidof dragonfly)/root/tmp/profile/something.prof /tmp/realtmp.prof
Thank you @UkuLoskit - I did not know this.
Unfortunately the profiler data is not very useful, or at least looks very, very weird as if it shows wrong function names.
what's your ubuntu version and the architecture uname -a
and cat /etc/os-release
?
Also, can you please provide "info all" output on the master and on the replica?
master Linux ovh-innova 5.15.0-73-generic #80~20.04.1-Ubuntu SMP Wed May 17 14:58:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.6 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal
replica 1 Linux innova2 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.3 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.3 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy
replica 2 Linux innova2 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.3 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.3 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy dfly_info.log
I try 1.- Stop dragonfly on replica 1 2.- rm /var/lib/dragonfly/* 3.- start dragonfly on replica 1
Get this error: E20231025 07:30:31.125586 866245 rdb_load.cc:619] Duplicate hash fields detected E20231025 07:30:31.135639 866244 rdb_load.cc:619] Duplicate hash fields detected E20231025 07:30:31.135838 866246 rdb_load.cc:619] Duplicate hash fields detected E20231025 07:30:31.142112 866248 rdb_load.cc:619] Duplicate hash fields detected E20231025 07:30:31.147995 866241 rdb_load.cc:619] Duplicate hash fields detected E20231025 07:30:31.149091 866247 rdb_load.cc:619] Duplicate hash fields detected E20231025 07:30:31.161890 866243 rdb_load.cc:619] Duplicate hash fields detected E20231025 07:30:31.177135 866242 rdb_load.cc:619] Duplicate hash fields detected W20231025 07:30:31.178701 866243 replica.cc:214] Error syncing with 172.16.0.24:6392 dragonfly.rdbload:5 Internal error when loading RDB file 5
can you please DM me on discord? will be easier to talk
I would like to give you a dev version of dragonfly that will help with understanding both issues you are experiencing. Please tell me if you are interested.
Hello, I think the problem comes from "duplicity of sets". I have stopped the three nodes and tried to start in order. When I tried to start the master I got the duplicate sets error and the only way to start was to delete the saved files. Are you interested in me sending you the database files?
yes, very much so
Can I send you the files privately?
sure. Can you DM me on discord @fernandomacho ?
I just created an account on discord, I've never used this, from what I see I have to ask you for contact right? Your username on discord is romange?
I have a 3 nodes running dragonfly 1 master and 2 replicas. The access to dragonfly instances is over haproxy with 2 backend: 1 for write (master) and 1 for read (master/replica)
The kernel version on servers are 5.15.0-73-generic on all servers.
The hardware is: Master: 96 cores and 376 Gb ram Replica 1: 48 cores and 256 Gb ram Replica 1: 48 cores and 256 Gb ram
The load of clients is:
On Master:
Clients
connected_clients:402 client_read_buffer_bytes:14292480 blocked_clients:0 dispatch_queue_entries:0
On replica 1:
Clients
connected_clients:188 client_read_buffer_bytes:52480 blocked_clients:0 dispatch_queue_entries:108365
On replica 2: connected_clients:202 client_read_buffer_bytes:54272 blocked_clients:0 dispatch_queue_entries:61381
The replication status is: On Master: role:master connected_slaves:2 slave0:ip=172.16.0.23,port=6392,state=stable_sync,lag=74 slave1:ip=172.16.0.25,port=6392,state=stable_sync,lag=61 master_replid:2c604ff0c72153701e8be30d707aa72d8f9d4ca9
On replica 1: role:replica master_host:172.16.0.24 master_port:6392 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0
On replica 2: role:replica master_host:172.16.0.24 master_port:6392 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0
The CPU on master consum is over 20% The CPU on replica 1 consum is over 370% The CPU on replica 1 consum is over 300%
Any log on dragonfly log files.
The nodes run sentinel and show multiple: 1303:X 24 Oct 2023 19:02:26.831 # +sdown slave 172.16.0.23:6392 172.16.0.23 6392 @ ovh-innova 172.16.0.24 6392
I think these errors are due to the high CPU load of the replica nodes
How is it possible that for many more clients the CPU load of the master is much lower than that of the replicas??
I attach info command on all nodes.
Thanks dfly_info.log