amarts / glusterfs

GlusterFS: a distributed, software defined, scale-out, portable file-system. This repo contains history of Gluster project
GNU General Public License v2.0
0 stars 1 forks source link

[bug:1618932] dht-selfheal.c: Directory selfheal failed #13

Open amarts opened 4 years ago

amarts commented 4 years ago

bugzilla-URL: https://bugzilla.redhat.com/1618932 Created attachment 1476762 gfapi log

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

There's mulitple application using gfapi concurrently creating file in the same directory (e51fd83622674cc9) and (e21ea6832d2b13d0) are log from different application processes.

application log

timezone is GMT+8

2018-08-18 19:35:03,703 DEBUG -31021968- writing to file cluster=4 FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88 (app:1461)(e51fd83622674cc9) 2018-08-18 19:35:03,734 DEBUG -32369552- writing to file cluster=4 FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88 (app:1461)(e21ea6832d2b13d0) 2018-08-18 19:35:03,786 DEBUG -31022448- Create new directory [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m] on cluster [4] ((unknown file): 0)(e51fd83622674cc9) 2018-08-18 19:35:03,795 CRITICAL -31021968- Failed to open cluster [4] object [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/ 0004_bfab2d1ea2da11e8a3196c92bf5c1b88] with mode [w]: [[Errno 5] Input/output error] (app:1461)(e51fd83622674cc9) 2018-08-18 19:35:03,903 DEBUG -32366672- Directory [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m] exists on cluster [4] ((unknown file): 0)(e21ea6832d2b13d0) 2018-08-18 19:35:03,945 DEBUG -32369552- Open cluster [4] file [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88] with mode [w] (app:1461)(e21ea6832d2b13d0) 2018-08-18 19:35:04,127 DEBUG -31021968- Open cluster [4] file [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88] with mode [w] (app:1461)(e51fd83622674cc9) 2018-08-18 19:35:04,391 INFO -32369552- Rename file: cluster=4 src=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88 dst=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001 (app:1461)(e21ea6832d2b13d0) 2018-08-18 19:35:04,485 INFO -31021968- Rename file: cluster=4 src=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88 dst=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004 (app:1461)(e51fd83622674cc9)

Actual results:

IO error happended when creating file, success after retry

dht-selfheal failure is observed in gfapi log, there is unmatched inode unlock request reported from brick.

Expected results:

Additional info:

"gluster volume status" output is all ok, but runing "gluster volume heal vol0 info" blocks and no output

gluster volume info

Volume Name: vol0 Type: Distributed-Replicate Volume ID: 18e1c05d-570a-4c97-aa91-ef984881c4f2 Status: Started Snapshot Count: 0 Number of Bricks: 36 x 3 = 108 Transport-type: tcp

Options Reconfigured: locks.trace: false client.event-threads: 6 cluster.self-heal-daemon: enable performance.write-behind: True transport.keepalive: True cluster.rebal-throttle: lazy server.event-threads: 4 performance.io-cache: False nfs.disable: True cluster.quorum-type: auto network.ping-timeout: 120 features.cache-invalidation: False performance.read-ahead: False performance.client-io-threads: True cluster.server-quorum-type: none performance.md-cache-timeout: 0 performance.readdir-ahead: True

amarts commented 4 years ago

{'bug_id': 1618932, 'count': 26, 'creation_time': <DateTime '20191104T07:50:00' at 0x7f1181361978>, 'creator': 'nbalacha', 'creator_id': 366368, 'id': 13318836, 'is_private': False, 'tags': [], 'text': '(In reply to frostyplanet from comment #19)\n' '> Created attachment 1487508 [details]\n' '> gfapi log for io error in 3.12.14\n' '> \n' '> New sample of gfapi log in version 3.12.14. IO error happended ' 'while\n' '> creating file\n' '\n' 'I do not see IO errors in the gfapi log. Please provide debug/trace ' 'logs when the issue is seen in 3.12', 'time': <DateTime '20191104T07:50:00' at 0x7f1181361898>}