Open SowjanyaKotha opened 12 months ago
Bump on this one to see if there is a solution
@amarts @avati - can you please point us in the right direction so that we can proceed. segfaults are not typical and hence wondering why this is being ignored.
I will look into this and update.
Thanks @aravindavk - @SowjanyaKotha will reply on this. Really appreciate the quick response here 👍
@aravindavk The fault on the existing node volume happens at different times. add-brick is on such case(most cases), It can happen at remove-brick as well. When the node is replaced, the new node is clean and gluster packages are installed. The node is offline before the remove-brick is done. So, didn't use reset-brick.
@aravindavk any updates on this? We're hitting this issue consistently after a few attempts and hence pushing for a solution
@amarts @avati seems like support for the project is lacking now. Can someone help please.
From the backtrace, I can see that SSL_read
is crashed.
What were the steps used to setup new node and the existing nodes (Clients and Servers)?
New SSL key generated in the new node (used in add-brick command) or SSL key file is reused from the existing node that is replaced?
If cleanup is not done to /usr/lib/ssl/glusterfs.ca
file, then delete this file or find the old node's certificate from this file and add the new node's details.
I tested this in our lab, couldn't reproduce the crash. The steps I did were:
server1.gluster
and server2.gluster
)server2.gluster
from server1.gluster
server1.gluster
server2.gluster
)server2.gluster
glusterfs.ca
file and copy to all nodes and clientsThe details about the tests are available here:
@aravindavk A new certificate is created for the node. But the issue happens randomly. If the certificate is not correct, it should always fail. Would it matter if the cert location is not the default one?
Description of problem: Setup of 2 node mirrored volumes with clients installed on both nodes. When one of the node becomes faulty, the node is removed and replaced with a new node with the same name/IP. While adding brick, the active client crashes. The issue occurs randomly when ssl is enabled on IO. It is not seen in non-ssl setups.
The exact command to reproduce the issue: gluster volume add-brick efa_logs replica 2 10.18.120.135:/apps/opt/efa/logs force
The full output of the command that failed:
- The operating system / glusterfs version: It is reproducible with gluster version 9.6 and 11.0 on Ubuntu setup installed with Debian files.
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration