gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.75k stars 1.08k forks source link

[bug:1761350] Directories are not healed, when dirs are created on the backend bricks and performed lookup from mount path. #856

Closed gluster-ant closed 4 years ago

gluster-ant commented 4 years ago

URL: https://bugzilla.redhat.com/1761350 Creator: mwaykole at redhat Time: 20191014T08:46:40

[afr] Heal is not completed in (1*3)replicated volume after enabling client side healing options . Some files are always left while healing .

"metadata-self-heal": "on", "entry-self-heal": "on", "data-self-heal": "on"

steps:

    1) create replicate volume ( 1 * 3 )
    2. Test the case with default afr options.
    3. Test the case with volume option 'self-heal-daemon'
    4) create dirs on bricks from the backend. lets say dir1, dir2 and dir3
    5) From mount point,
        echo "hi" >dir1 ->must fail
        touch dir2 --> must pass
        mkdir dir3 ->must fail
    6) From mount point,
        ls -l  and find, must list both dir1 and dir2 and dir3
    7) check on all backend bricks, dir1, dir2 and dir3 should be created
    8) heal info should show zero, and also gfid and other attributes
     must exist

Actual result :

gluster volume heal testvol_replicated info

Brick server:/bricks/brick1/testvol_replicated_brick0 Status: Connected Number of entries: 1

Brick server2:/bricks/brick1/testvol_replicated_brick1 Status: Connected Number of entries: 1

Brick serevr3:/bricks/brick1/testvol_replicated_brick2 Status: Connected Number of entries: 1

Expected Result :

Brick server:/bricks/brick1/testvol_replicated_brick0 Status: Connected Number of entries: 0

Brick server2:/bricks/brick1/testvol_replicated_brick1 Status: Connected Number of entries: 0

Brick serevr3:/bricks/brick1/testvol_replicated_brick2 Status: Connected Number of entries: 0

Additional info:

[2019-10-09 09:15:36.822052] W [socket.c:774:socket_rwv] 0-testvol_replicated-client-0: readv on 10.70.35.132:49152 failed (No data available) The message "I [MSGID: 100040] [glusterfsd-mgmt.c:106:mgmt_process_volfile] 0-glusterfs: No change in volfile, continuing" repeated 2 times between [2019-10-09 09:15:35.989751] and [2019-10-09 09:15:36.303521] [2019-10-09 09:15:36.822109] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-0: disconnected from testvol_replicated-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2019-10-09 09:15:38.859761] W [socket.c:774:socket_rwv] 0-testvol_replicated-client-1: readv on 10.70.35.216:49152 failed (No data available) [2019-10-09 09:15:38.859805] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-1: disconnected from testvol_replicated-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2019-10-09 09:15:38.859834] W [MSGID: 108001] [afr-common.c:5653:afr_notify] 0-testvol_replicated-replicate-0: Client-quorum is not met [2019-10-09 09:15:38.860994] W [socket.c:774:socket_rwv] 0-testvol_replicated-client-2: readv on 10.70.35.80:49152 failed (No data available) [2019-10-09 09:15:38.861025] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-2: disconnected from testvol_replicated-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2019-10-09 09:15:38.861046] E [MSGID: 108006] [afr-common.c:5357:afr_handle_child_down_event] 0-testvol_replicated-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. [2019-10-09 09:15:39.827168] E [MSGID: 114058] [client-handshake.c:1268:client_query_portmap_cbk] 0-testvol_replicated-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2019-10-09 09:15:39.827274] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 0-testvol_replicated-client-0: disconnected from testvol_replicated-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2019-10-09 09:15:39.881864] W [glusterfsd.c:1645:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fd028181dd5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x56243aebc805] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x56243aebc66b] ) 0-: received signum (15), shutting down

######################### latest logs [2019-10-11 11:25:29.749047] I [rpc-clnt.c:1967:rpc_clnt_reconfig] 5-testvol_replicated-client-1: changing port to 49155 (from 0) [2019-10-11 11:25:29.754160] I [rpc-clnt.c:1967:rpc_clnt_reconfig] 5-testvol_replicated-client-2: changing port to 49155 (from 0) [2019-10-11 11:25:29.754806] I [MSGID: 114057] [client-handshake.c:1188:select_server_supported_programs] 5-testvol_replicated-client-1: Using Program GlusterFS 4.x v1, Num (1298437), Version (400) x[2019-10-11 11:25:29.756036] I [MSGID: 114046] [client-handshake.c:904:client_setvolume_cbk] 5-testvol_replicated-client-1: Connected to testvol_replicated-client-1, attached to remote volume '/bricks/brick1/testvol_replicated_brick1'. [2019-10-11 11:25:29.756076] I [MSGID: 108002] [afr-common.c:5648:afr_notify] 5-testvol_replicated-replicate-0: Client-quorum is met [2019-10-11 11:25:29.758143] I [MSGID: 114057] [client-handshake.c:1188:select_server_supported_programs] 5-testvol_replicated-client-2: Using Program GlusterFS 4.x v1, Num (1298437), Version (400) [2019-10-11 11:25:29.759918] I [MSGID: 114046] [client-handshake.c:904:client_setvolume_cbk] 5-testvol_replicated-client-2: Connected to testvol_replicated-client-2, attached to remote volume '/bricks/brick1/testvol_replicated_brick2'. [2019-10-11 11:25:30.778455] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 5-testvol_replicated-replicate-0: performing metadata selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e [2019-10-11 11:25:30.793172] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 5-testvol_replicated-replicate-0: Completed metadata selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e. sources=[0] sinks=1 2
[2019-10-11 11:25:30.797468] I [MSGID: 108026] [afr-self-heal-entry.c:916:afr_selfheal_entry_do] 5-testvol_replicated-replicate-0: performing entry selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e [2019-10-11 11:25:30.812701] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 5-testvol_replicated-replicate-0: Completed entry selfheal on fb2f6540-41eb-4ed6-9fe1-e821f02bda9e. sources=[0] sinks=1 2
Ending Test: functional.afr.test_gfid_assignment_on_lookup.AssignGfidOnLookup_cplex_replicated_glusterfs.test_gfid_assignment_on_lookup : 16_55_11_10_2019 [2019-10-11 11:25:31.572199] W [socket.c:774:socket_rwv] 5-testvol_replicated-client-0: readv on 10.70.35.132:49155 failed (No data available) [2019-10-11 11:25:31.572250] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 5-testvol_replicated-client-0: disconnected from testvol_replicated-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2019-10-11 11:25:31.820309] W [MSGID: 114031] [client-rpc-fops_v2.c:911:client4_0_getxattr_cbk] 5-testvol_replicated-client-0: remote operation failed. [{path=/}, {gfid=00000000-0000-0000-0000-000000000001}, {key=glusterfs.xattrop_index_gfid}, {errno=107}, {error=Transport endpoint is not connected}] [2019-10-11 11:25:31.820350] W [MSGID: 114029] [client-rpc-fops_v2.c:4467:client4_0_getxattr] 5-testvol_replicated-client-0: failed to send the fop [2019-10-11 11:25:31.820366] W [MSGID: 108034] [afr-self-heald.c:463:afr_shd_index_sweep] 5-testvol_replicated-replicate-0: unable to get index-dir on testvol_replicated-client-0 [2019-10-11 11:25:32.601159] I [MSGID: 101218] [graph.c:1522:glusterfs_process_svc_detach] 0-mgmt: detaching child shd/testvol_replicated [2019-10-11 11:25:32.601338] I [MSGID: 114021] [client.c:2498:notify] 5-testvol_replicated-client-0: current graph is no longer active, destroying rpc_client
[2019-10-11 11:25:32.601377] I [MSGID: 114021] [client.c:2498:notify] 5-testvol_replicated-client-1: current graph is no longer active, destroying rpc_client
[2019-10-11 11:25:32.601663] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 5-testvol_replicated-client-1: disconnected from testvol_replicated-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2019-10-11 11:25:32.601691] W [MSGID: 108001] [afr-common.c:5654:afr_notify] 5-testvol_replicated-replicate-0: Client-quorum is not met [2019-10-11 11:25:32.601600] I [MSGID: 114021] [client.c:2498:notify] 5-testvol_replicated-client-2: current graph is no longer active, destroying rpc_client
[2019-10-11 11:25:32.602242] I [MSGID: 114018] [client.c:2398:client_rpc_notify] 5-testvol_replicated-client-2: disconnected from testvol_replicated-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2019-10-11 11:25:32.602273] E [MSGID: 108006] [afr-common.c:5358:
afr_handle_child_down_event] 5-testvol_replicated-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. [2019-10-11 11:25:32.602649] I [io-stats.c:4047:fini] 0-testvol_replicated: io-stats translator

stale[bot] commented 4 years ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] commented 4 years ago

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.