gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.53k stars 1.07k forks source link

pgsql on glusterfs #1638

Closed FrelDX closed 2 years ago

FrelDX commented 3 years ago

I also encounter this problem. Running PgSQL will cause PgSQL to fail to start when a gluster node is unavailable

pgdata

[root@test1 /]# ll /var/lib/portsip/pgsql/data/pg_stat_tmp/ ls: cannot access /var/lib/portsip/pgsql/data/pg_stat_tmp/global.stat: Transport endpoint is not connected ls: cannot access /var/lib/portsip/pgsql/data/pg_stat_tmp/db_0.stat: Transport endpoint is not connected ls: cannot access /var/lib/portsip/pgsql/data/pg_stat_tmp/db_16384.stat: Transport endpoint is not connected total 0 -????????? ? ? ? ? ? db_0.stat -????????? ? ? ? ? ? db_16384.stat -????????? ? ? ? ? ? global.stat [root@test1 /]#

what does means Transport endpoint is not connected

mohit84 commented 3 years ago

Transport endpoint is not connected means either a brick process is not running or the client is not able to communicate with a brick process.To ensure the same you can run a command gluster v status to see a brick process is running or not.

FrelDX commented 3 years ago

Transport endpoint is not connected means either a brick process is not running or the client is not able to communicate with a brick process.To ensure the same you can run a command gluster v status to see a brick process is running or not.

I have three glusterfs clusters. I have created one replica coupon and three replicas. The PgSQL data directory is in the replica. When one of my glusterfs is shut down, PgSQL is dispatched to another machine and fails to start. This error occurs,

FrelDX commented 3 years ago

Transport endpoint is not connected means either a brick process is not running or the client is not able to communicate with a brick process.To ensure the same you can run a command gluster v status to see a brick process is running or not.

Other files can be created and deleted. Only PgSQL data files can be viewed

mohit84 commented 3 years ago

Can you please share below data to debug it more 1) gluster v info 2) gluster v status 3) Dump /var/log/glusterfs from all the nodes

FrelDX commented 3 years ago

Can you please share below data to debug it more

  1. gluster v info
  2. gluster v status
  3. Dump /var/log/glusterfs from all the nodes

[root@test1 log]# gluster v info

Volume Name: pbx Type: Replicate Volume ID: ddbc04b8-a829-45fe-96a1-0c2d663a7624 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 192.168.1.242:/data/pbx Brick2: 192.168.1.216:/data/pbx Brick3: 192.168.1.230:/data/pbx Options Reconfigured: performance.open-behind: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off locks.mandatory-locking: off [root@test1 log]# gluster v status Status of volume: pbx Gluster process TCP Port RDMA Port Online Pid

Brick 192.168.1.242:/data/pbx 49152 0 Y 1226 Brick 192.168.1.230:/data/pbx 49153 0 Y 707
Self-heal Daemon on localhost N/A N/A Y 1236 Self-heal Daemon on 192.168.1.230 N/A N/A Y 738

Task Status of Volume pbx

There are no active volume tasks

[root@test1 log]#

FrelDX commented 3 years ago

[Uploading glusterfs.tar.gz…]()

mohit84 commented 3 years ago

From the CLI output It seems the one brick process is not running. I am not able to download the tar.

FrelDX commented 3 years ago

High availability, simulating a node hanging up

FrelDX commented 3 years ago

3 nodes hang up one, should not affect the use of files

FrelDX commented 3 years ago

glusterfs.tar.gz

FrelDX commented 3 years ago

从CLI输出中,似乎一个砖块进程未运行。 我无法下载tar。

Are some files locked by the hanging node, and then the node hangs and the lock is not released, resulting in the failure of the other two nodes to obtain the files

pranithk commented 3 years ago

@FrelDX Locks are not held by servers/bricks, so that is not a possibility.

FrelDX commented 3 years ago

锁不由服务器/砖块持有,因此这是不可能的。

Can PgSQL run on glusterfs

pranithk commented 3 years ago

The logs you attached doesn't have any files with name var-lib-.....log Could you attach the logfile from the machine where you are observing the failure?

I checked the repo and it looks like for db-workload the following profile should be enabled: 10:45:54 :) ⚡ cat extras/group-db-workload performance.open-behind=on performance.write-behind=off performance.stat-prefetch=off performance.quick-read=off performance.strict-o-direct=on performance.read-ahead=off performance.io-cache=off performance.readdir-ahead=off performance.client-io-threads=on server.event-threads=4 client.event-threads=4 performance.read-after-open=yes

pranithk commented 3 years ago

There was a crash reported with open-behind which was fixed recently, so you may want to turn it off in case you are in affected version

FrelDX commented 3 years ago

The logs you attached doesn't have any files with name var-lib-.....log Could you attach the logfile from the machine where you are observing the failure?

I checked the repo and it looks like for db-workload the following profile should be enabled: 10:45:54 :) ⚡ cat extras/group-db-workload performance.open-behind=on performance.write-behind=off performance.stat-prefetch=off performance.quick-read=off performance.strict-o-direct=on performance.read-ahead=off performance.io-cache=off performance.readdir-ahead=off performance.client-io-threads=on server.event-threads=4 client.event-threads=4 performance.read-after-open=yes

Can PgSQL run successfully with the above configuration enabled? My version is gluster

FrelDX commented 3 years ago

version is gluster 7.8

pranithk commented 3 years ago

open-behind issue is fixed on 7.8. The profile I mentioned above is tested for dbworkloads. I remember pgSQL to be one of those that were tested.

I still need the logs to find what the problem you are facing is.

FrelDX commented 3 years ago

glusterfs.tar.gz my host /var/log/gluster

FrelDX commented 3 years ago

The attachment I sent is a file in the / var / log / gluster directory on the server

FrelDX commented 3 years ago

The logs you attached doesn't have any files with name var-lib-.....log Could you attach the logfile from the machine where you are observing the failure?

I checked the repo and it looks like for db-workload the following profile should be enabled: 10:45:54 :) ⚡ cat extras/group-db-workload performance.open-behind=on performance.write-behind=off performance.stat-prefetch=off performance.quick-read=off performance.strict-o-direct=on performance.read-ahead=off performance.io-cache=off performance.readdir-ahead=off performance.client-io-threads=on server.event-threads=4 client.event-threads=4 performance.read-after-open=yes

what is repo's url

pranithk commented 3 years ago

The logs you attached doesn't have any files with name var-lib-.....log Could you attach the logfile from the machine where you are observing the failure? I checked the repo and it looks like for db-workload the following profile should be enabled: 10:45:54 :) zap cat extras/group-db-workload performance.open-behind=on performance.write-behind=off performance.stat-prefetch=off performance.quick-read=off performance.strict-o-direct=on performance.read-ahead=off performance.io-cache=off performance.readdir-ahead=off performance.client-io-threads=on server.event-threads=4 client.event-threads=4 performance.read-after-open=yes

what is repo's url

It is this github repo. Exact link to the file I pasted: https://github.com/gluster/glusterfs/blob/devel/extras/group-db-workload

pranithk commented 3 years ago

It is indeed the same issue I was referring to in the issue yesterday:

Could you please attach output of: getfattr -d -m. -e hex </path/to//pgsql/data/current_logfiles > on all the bricks Also getfattr -d -m. -e hex /path/to//pgsql/data on all the bricks to debug it further.

[2020-10-15 01:01:32.965538] E [MSGID: 108008] [afr-self-heal-common.c:392:afr_gfid_split_brain_source] 0-pbx-replicate-0: Gfid mismatch detected for /postmaster.pid>, c822904d-4214-4212-9369-37690339dcbc on pbx-client-1 and 4201eb87-bdd2-4afe-9aba-428a59a688b0 on pbx-client-0. [2020-10-15 01:01:32.966612] W [MSGID: 108027] [afr-common.c:2255:afr_attempt_readsubvol_set] 0-pbx-replicate-0: no read subvols for /pgsql/data/postmaster.pid [2020-10-15 01:01:32.966710] W [fuse-bridge.c:1047:fuse_entry_cbk] 0-glusterfs-fuse: 5857: LOOKUP() /pgsql/data/postmaster.pid => -1 (Transport endpoint is not connected) [2020-10-15 01:01:32.969572] E [MSGID: 108008] [afr-self-heal-common.c:392:afr_gfid_split_brain_source] 0-pbx-replicate-0: Gfid mismatch detected for /current_logfiles>, 583cd94a-4f40-433d-b542-0d79d9846d0a on pbx-client-1 and c8e66dc6-7a09-43a9-9e59-c4a5423297ac on pbx-client-0. [2020-10-15 01:01:32.970586] W [MSGID: 108027] [afr-common.c:2255:afr_attempt_readsubvol_set] 0-pbx-replicate-0: no read subvols for /pgsql/data/current_logfiles

FrelDX commented 3 years ago

附加的日志没有名称为var-lib -... log的任何文件您能否从观察故障的计算机上附加日志文件? 我检查了存储库,看起来应该为db-workload启用了以下配置文件:10: 45 : 54 :) zap cat extras / group-db-workload performance.open-behind = on performance.write-behind = off performance .stat预取= OFF performance.quick读=关 上performance.strict-O-直接= performance.read反超=关 performance.io缓存= OFF performance.readdir反超=关 performance.client-IO线程=在 server.event-threads = 4 client.event-threads = 4 performance.read-after-open = yes

回购的网址是什么

这是这个github仓库。精确链接到我粘贴的文件:https : //github.com/gluster/glusterfs/blob/devel/extras/group-db-workload

01BC1B18-F3A3-4f4a-9708-E03E10FB54CE Test1 test2 test3 is a cluster. When test1 is suddenly powered off, the data of other nodes will be inconsistent. When test1 is powered on, it will recover by itself. However, most of the time, we hope that test1 is hung and the cluster is still available. Is there any configuration

The following picture shows that when the files are inconsistent, there will be a lot of errors when checking the gluster volume????? Number

1B3FD6B6-5C60-4ff1-9025-CEBACE489CA4

FrelDX commented 3 years ago

It is indeed the same issue I was referring to in the issue yesterday:

Could you please attach output of: getfattr -d -m. -e hex </path/to//pgsql/data/current_logfiles > on all the bricks Also getfattr -d -m. -e hex /path/to//pgsql/data on all the bricks to debug it further.

[2020-10-15 01:01:32.965538] E [MSGID: 108008] [afr-self-heal-common.c:392:afr_gfid_split_brain_source] 0-pbx-replicate-0: Gfid mismatch detected for gfid:c4ef47e1-f88d-41ab-b812-4706e3d8d104/postmaster.pid>, c822904d-4214-4212-9369-37690339dcbc on pbx-client-1 and 4201eb87-bdd2-4afe-9aba-428a59a688b0 on pbx-client-0. [2020-10-15 01:01:32.966612] W [MSGID: 108027] [afr-common.c:2255:afr_attempt_readsubvol_set] 0-pbx-replicate-0: no read subvols for /pgsql/data/postmaster.pid [2020-10-15 01:01:32.966710] W [fuse-bridge.c:1047:fuse_entry_cbk] 0-glusterfs-fuse: 5857: LOOKUP() /pgsql/data/postmaster.pid => -1 (Transport endpoint is not connected) [2020-10-15 01:01:32.969572] E [MSGID: 108008] [afr-self-heal-common.c:392:afr_gfid_split_brain_source] 0-pbx-replicate-0: Gfid mismatch detected for gfid:c4ef47e1-f88d-41ab-b812-4706e3d8d104/current_logfiles>, 583cd94a-4f40-433d-b542-0d79d9846d0a on pbx-client-1 and c8e66dc6-7a09-43a9-9e59-c4a5423297ac on pbx-client-0. [2020-10-15 01:01:32.970586] W [MSGID: 108027] [afr-common.c:2255:afr_attempt_readsubvol_set] 0-pbx-replicate-0: no read subvols for /pgsql/data/current_logfiles

In my opinion, test1 synchronizes data to test2 and test3. When test3 is synchronized, test1 is considered successful. Then, a sudden power failure causes test2 to fail to synchronize with test1. Then, the data above test2 and test3 are inconsistent, and PgSQL cannot be started

pranithk commented 3 years ago

@FrelDX That is not how it works on glusterfs. It is the mount on client machines which replicates the data, not server.

pranithk commented 3 years ago

Could you provide the info I asked in the previous comment with getfattr commands?

FrelDX commented 3 years ago

您可以使用getfattr命令提供我在上一条注释中询问的信息吗?

ok

FrelDX commented 3 years ago

Could you provide the info I asked in the previous comment with getfattr commands?

Test1 is a disconnected machine. You can view some output through your command. Test2 and test3 cannot use your command to view it 9CCECBDC-E6C4-4a85-8D21-E81DA2C02738 6B7CC917-88DF-4929-968D-8DA1B9BFA9CB

pranithk commented 3 years ago

Sorry for the confusion. You need to execute this on the bricks not on the mount point. Could you please paste the output in text, so that it is easy to copy/paste and search through the logs?

FrelDX commented 3 years ago

ok

FrelDX commented 3 years ago

I randomly found a file with a problem and used your command to view it, because my previous environment has been destroyed

pranithk commented 3 years ago

Okay. Without the output for the commands I won't be able to debug the issue further.

FrelDX commented 3 years ago

Okay. Without the output for the commands I won't be able to debug the issue further.

[root@test2 dialog]# getfattr -d -m. -e hex ./LOG

file: LOG

trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x474638f6c9154bf5975d14c5b088e924 trusted.gfid2path.d392b5c3cec86865=0x32383434633464652d343435312d343761622d616166382d3538303935383235656166632f4c4f47 trusted.glusterfs.mdata=0x010000000000000000000000005f87e380000000000e0542be000000005f87e380000000000e0542be000000005f87e37f000000002e82c967

[root@test2 dialog]#

[root@test3 dialog]# getfattr -d -m. -e hex ./LOG

file: LOG

trusted.afr.dirty=0x000000000000000000000000 trusted.afr.pbx-client-1=0x000000040000000100000000 trusted.gfid=0x3c4c1324758a420c8eef11bd60ee8fcb trusted.gfid2path.d392b5c3cec86865=0x32383434633464652d343435312d343761622d616166382d3538303935383235656166632f4c4f47 trusted.glusterfs.mdata=0x010000000000000000000000005f87e52c000000001bc176b1000000005f87e52c000000001bc176b1000000005f87e52c0000000016766015

[root@test3 dialog]#

In the picture is test1 simulation shutdown (off network state), because if I keep on the network, he will automatically recover the fault 11111111111111

FrelDX commented 3 years ago

3333333333333333

FrelDX commented 3 years ago

Test2's trusted.glusterfs.mdata It's not the same as test1, test2

FrelDX commented 3 years ago

Okay. Without the output for the commands I won't be able to debug the issue further.

pranithk commented 3 years ago

Are you running the workload on test1 machine?

FrelDX commented 3 years ago

test1

I use three cluster groups to simulate the high availability of test1, and whether test2 and test3 can run normally. Now I find that the problem is that after test1 fails to shut down, the services on test2 and test3 can not run. After test1 is started, the services of test2 and test3 can run. So I am simulating the fault and turn test1 off and off the network. The fault can't be solved. After test1 is connected to the network, the service on test2 and test3 cannot be solved, The fault is solved, but this is not what I want. The result I want is that test1 is shut down due to failure. No matter whether test1 is turned on or not, test2 and test3 can run successfully

FrelDX commented 3 years ago

Are you running the workload on test1 machine?

I use three cluster groups to simulate the high availability of test1, and whether test2 and test3 can run normally. Now I find that the problem is that after test1 fails to shut down, the services on test2 and test3 can not run. After test1 is started, the services of test2 and test3 can run. So I am simulating the fault and turn test1 off and off the network. The fault can't be solved. After test1 is connected to the network, the service on test2 and test3 cannot be solved, The fault is solved, but this is not what I want. The result I want is that test1 is shut down due to failure. No matter whether test1 is turned on or not, test2 and test3 can run successfully

FrelDX commented 3 years ago

您是否正在test1机器上运行工作负载?

The original three machines are running normally. When test1 is shut down, test2 and test3 cannot run the service. The result I hope is that test2 and test3 can still run the service after test1 is shut down

pranithk commented 3 years ago

I understood the requirement. My question is, is the sql workload run on test1? i.e. on the test1 server machine instead of a separate client machine?

FrelDX commented 3 years ago

yes

FrelDX commented 3 years ago

I understood the requirement. My question is, is the sql workload run on test1? i.e. on the test1 server machine instead of a separate client machine?

Both SQL and glusterfs run on test1. When test1 fails, SQL migrates to test2

FrelDX commented 3 years ago

I understood the requirement. My question is, is the sql workload run on test1? i.e. on the test1 server machine instead of a separate client machine?

Whether glusterfs has parameter settings, all node data validation is completed, and then returns success to the client

pranithk commented 3 years ago

@FrelDX Because you are mounting the filesystem on test1, you are using test1 as both client and server, when server1 dies both the client machine and the server machines are dying. It is difficult to recover from these failures for the workload you have until one of the issues we have is fixed.

@karthik-us I am not able to find the gfid split-brain issue you raised. Could you point to that issue?

karthik-us commented 3 years ago

@pranithk Here it goes https://github.com/gluster/glusterfs/issues/502 and is filed by Ravi, maybe that's why you were not able to find the link.

FrelDX commented 3 years ago

Is there a solution? My architecture needs to run PgSQL on the glusterfs server

@FrelDX Because you are mounting the filesystem on test1, you are using test1 as both client and server, when server1 dies both the client machine and the server machines are dying. It is difficult to recover from these failures for the workload you have until one of the issues we have is fixed.

@karthik-us I am not able to find the gfid split-brain issue you raised. Could you point to that issue?

@pranithk Here it goes #502 and is filed by Ravi, maybe that's why you were not able to find the link.

pranithk commented 3 years ago

@FrelDX No, not without the fix to #502

mrSingh007 commented 3 years ago

I am getting same issue with postgres WARNING: could not open statistics file "pg_stat_tmp/global.stat": Stale file handle If I change permission of global.stat to 777 (chmod 777 global.stat) the warning disappears but after sometime come again and permission are back to

-rw------- 1 postgres postgres 1352 Nov 11 13:04 global.stat (inside of postgres container) -rw------- 2 systemd-coredump systemd-coredump 1352 Nov 11 13:05 global.stat (inside of volume on node)