Closed Bockeman closed 4 years ago
@xhernandez Thanks for your detailed response.
In your particular case I would recommend to disable gfid2path feature. You also seem to be using quota. Quota works on a per-directory basis, but given you have multiple hardlinks, I'm not sure if it makes sense (to which directory the quota should be accounted for ?). If not strictly necessary, I would also disable quota.
Please could you tell me how to disable the gfid2path feature, I cannot find it in the documentation.
I disabled quota.
I deleted all 809 files that were causing the [Argument list too long] that self-healing could not handle.
files that exist on more than one brick
You are using a replica. It's expected to have the same file in more than one brick.
I mean, files that exist on more than one brick for a given server. Cases like this result in
ls -l /srv/gluvol1/vault/bobw/files/home/.mozilla/firefox/6qw6a8eb.cockpit \
2>&1 | awk '/?/{print " " $0}'; date +\ \ %F\ %T
-????????? ? ? ? ? ? broadcast-listeners.json
-????????? ? ? ? ? ? prefs.js
2020-10-05 16:17:28
dangling gfid files, where the named file has been deleted directly from a brick, but not the corresponding gfid file
You should never do this. It can cause more troubles.
Obviously, I do not intentionally delete files directly from bricks, but I have found this is often the only way to resolve certain issues (like split-brain), but it is always possible with manual intervention like this that I could make a mistake.
I will attempt to getfattr and hardlink count for all files on all bricks, but I will need to be careful how I do that. There's no point on getfattr for each file, it only needs to be done once per inode. Given a "find" on a 11TB subset of all the data takes over 5 hours, this could take days.
I don't believe the issue is only specific to hardlink. gluster populates a xattr per hardlink basis, unless we don't know about the xattr populated on the backend it is difficult to find the reason. As xavi asked earlier to share about the xatts but getfattr is failing due to throwing an error "Argument list too long" so it is difficult, i am not sure if btrsfs provide some tool to fetch this info.
In gluster default value of storage.max-hardlinks is 100 , unless you have not changed the value you can't create more than 100 hardlinks so I am not sure issue is only specific to hardlinks. As you said quota is enabled , quota also populates some xattr but not too much. I am not sure about an application if it has created many custom xattrs on backend.
For the time-being you can disable storage.gfid2path as like below after disable this application can create a hardlink and gluster won't populate a new xattr(gfid2path) for every hardlink, after disable it we can restrict only gfid2path xattr but we can't restrict if application is trying to populate custom xattr on file.
gluster v set /
For specific to brick crash we have to fix the code path, we need to call MALLOC instead of calling alloca in case if xattrsize is greater than some limit like(64k/128k).
Disaster struck, see https://github.com/gluster/glusterfs/issues/1729 and https://github.com/gluster/glusterfs/issues/1728. I have now recovered lost data and am able to resume analysis on my gluster data.
I discovered a large number of files with silly permissions, and reported that on https://github.com/gluster/glusterfs/issues/1731
I am nervous about the integrity of my files, any suggestions welcome.
I am continuing with:
I will attempt to getfattr and hardlink count for all files on all bricks, but I will need to be careful how I do that. There's no point on getfattr for each file, it only needs to be done once per inode. Given a "find" on a 11TB subset of all the data takes over 5 hours, this could take days.
Perhaps someone could explain the value of hard-link for each file residing under
Also, why is there a mixture of hard-links and symbolic-links?
1584 63673 1 drwx------ root root 20-08-21 15:06:34.6387425020 brick07/.glusterfs/ff/ff
59 4192010 1 lrwxrwxrwx root root 20-10-10 20:48:09.9794764880 brick07/.glusterfs/ff/fe/fffefb07-795e-4a27-90da-6db78c897c92
58 3507907 1 lrwxrwxrwx root root 20-10-08 19:39:32.8287566040 brick07/.glusterfs/ff/fe/fffedcc4-7293-4fe9-ab34-714a6ba015e5
0 3555211 17 ---------T bobw warren 20-10-08 20:48:31.1710559470 brick07/.glusterfs/ff/fe/fffea855-c37e-4691-825f-46caf90e9e28
47108 1488652 2 -rwxr--r-- bobw warren 07-11-11 15:07:43.8593750000 brick07/.glusterfs/ff/fe/fffea714-5236-45ee-820d-722fa3332694
987 1732599 2 -r--r--r-- bobw warren 20-03-15 19:44:06.5220859100 brick07/.glusterfs/ff/fe/fffea64c-b721-46a7-8d44-5980b3b14f8e
64 556533 1 lrwxrwxrwx root root 20-08-21 21:14:42.9929821840 brick07/.glusterfs/ff/fe/fffea161-6b61-41b3-b5ab-1cc17e00d321
0 1767388 2 -rwxrwxr-x bobw warren 19-03-10 16:28:47.0000000000 brick07/.glusterfs/ff/fe/fffe8f40-2b58-4ca4-8284-fc869de5cb0c
32768 1960787 2 -rw-r--r-- apache apache 20-05-09 21:28:54.0000000000 brick07/.glusterfs/ff/fe/fffe800f-c3ff-40d5-8e37-a7cea0726d87
71 3766279 1 lrwxrwxrwx root root 20-10-09 02:17:39.5293480090 brick07/.glusterfs/ff/fe/fffe7192-53d1-4718-bad3-9f9ba764354f
52 2726222 1 lrwxrwxrwx root root 20-09-12 21:56:00.1328402300 brick07/.glusterfs/ff/fe/fffe526c-3980-4d1b-85b2-08952c60c80b
0 3391565 9 ---------T bobw warren 20-10-08 15:25:04.3622390120 brick07/.glusterfs/ff/fe/fffe3a45-439c-40b4-a29b-d317c4c16fd3
396400 4500352 2 -rw-r--r-- apache apache 20-10-12 12:44:13.3103199880 brick07/.glusterfs/ff/fe/fffe2d3b-66fe-459b-99ff-fabf8ef7301f
57 392951 1 lrwxrwxrwx root root 20-08-21 18:52:10.6153474860 brick07/.glusterfs/ff/fe/fffe1f2a-6094-43cf-a6d2-5ea4e5a21095
71 4022210 1 lrwxrwxrwx root root 20-10-09 09:49:12.9389294170 brick07/.glusterfs/ff/fe/fffe1e86-3b01-45ee-8efa-cd198175b6c7
825 4783638 2 -rwxrwxrwx bobw warren 20-10-28 21:30:43.9246942800 brick07/.glusterfs/ff/fe/fffe1327-f179-4cfb-b85b-567a17d7143a
7553 682312 2 -r--r--r-- bobw warren 19-08-28 10:40:50.2532174380 brick07/.glusterfs/ff/fe/fffe0b07-bf0c-45dd-8e33-c268fa426d8c
6119845 1622319 2 -rwxr-xr-x bobw warren 20-04-19 18:46:56.1955260000 brick07/.glusterfs/ff/fe/fffe084e-4c39-453e-9a2c-50de00a677d2
This means that finding "dangling" gfids (i.e. a gfid file with no corresponding actual file) is more difficult than @xhernandez suggests:
To find them, this command should work:
find <brick root>/.glusterfs/ -type f -links 1
Any file returned inside /.glusterfs/
/ / with a single link could be removed (be careful to not do this when the volume has load. Otherwise find could incorrectly detect files that are still being created but have not fully completed).
Perhaps someone could explain the value of hard-link for each file residing under
/.glusterfs/XX/YY/. Also, why is there a mixture of hard-links and symbolic-links?
1584 63673 1 drwx------ root root 20-08-21 15:06:34.6387425020 brick07/.glusterfs/ff/ff 59 4192010 1 lrwxrwxrwx root root 20-10-10 20:48:09.9794764880 brick07/.glusterfs/ff/fe/fffefb07-795e-4a27-90da-6db78c897c92 58 3507907 1 lrwxrwxrwx root root 20-10-08 19:39:32.8287566040 brick07/.glusterfs/ff/fe/fffedcc4-7293-4fe9-ab34-714a6ba015e5 0 3555211 17 ---------T bobw warren 20-10-08 20:48:31.1710559470 brick07/.glusterfs/ff/fe/fffea855-c37e-4691-825f-46caf90e9e28 47108 1488652 2 -rwxr--r-- bobw warren 07-11-11 15:07:43.8593750000 brick07/.glusterfs/ff/fe/fffea714-5236-45ee-820d-722fa3332694 987 1732599 2 -r--r--r-- bobw warren 20-03-15 19:44:06.5220859100 brick07/.glusterfs/ff/fe/fffea64c-b721-46a7-8d44-5980b3b14f8e 64 556533 1 lrwxrwxrwx root root 20-08-21 21:14:42.9929821840 brick07/.glusterfs/ff/fe/fffea161-6b61-41b3-b5ab-1cc17e00d321 0 1767388 2 -rwxrwxr-x bobw warren 19-03-10 16:28:47.0000000000 brick07/.glusterfs/ff/fe/fffe8f40-2b58-4ca4-8284-fc869de5cb0c 32768 1960787 2 -rw-r--r-- apache apache 20-05-09 21:28:54.0000000000 brick07/.glusterfs/ff/fe/fffe800f-c3ff-40d5-8e37-a7cea0726d87 71 3766279 1 lrwxrwxrwx root root 20-10-09 02:17:39.5293480090 brick07/.glusterfs/ff/fe/fffe7192-53d1-4718-bad3-9f9ba764354f 52 2726222 1 lrwxrwxrwx root root 20-09-12 21:56:00.1328402300 brick07/.glusterfs/ff/fe/fffe526c-3980-4d1b-85b2-08952c60c80b 0 3391565 9 ---------T bobw warren 20-10-08 15:25:04.3622390120 brick07/.glusterfs/ff/fe/fffe3a45-439c-40b4-a29b-d317c4c16fd3 396400 4500352 2 -rw-r--r-- apache apache 20-10-12 12:44:13.3103199880 brick07/.glusterfs/ff/fe/fffe2d3b-66fe-459b-99ff-fabf8ef7301f 57 392951 1 lrwxrwxrwx root root 20-08-21 18:52:10.6153474860 brick07/.glusterfs/ff/fe/fffe1f2a-6094-43cf-a6d2-5ea4e5a21095 71 4022210 1 lrwxrwxrwx root root 20-10-09 09:49:12.9389294170 brick07/.glusterfs/ff/fe/fffe1e86-3b01-45ee-8efa-cd198175b6c7 825 4783638 2 -rwxrwxrwx bobw warren 20-10-28 21:30:43.9246942800 brick07/.glusterfs/ff/fe/fffe1327-f179-4cfb-b85b-567a17d7143a 7553 682312 2 -r--r--r-- bobw warren 19-08-28 10:40:50.2532174380 brick07/.glusterfs/ff/fe/fffe0b07-bf0c-45dd-8e33-c268fa426d8c 6119845 1622319 2 -rwxr-xr-x bobw warren 20-04-19 18:46:56.1955260000 brick07/.glusterfs/ff/fe/fffe084e-4c39-453e-9a2c-50de00a677d2
This means that finding "dangling" gfids (i.e. a gfid file with no corresponding actual file) is more difficult than @xhernandez suggests:
To find them, this command should work:
find <brick root>/.glusterfs/ -type f -links 1
Any file returned inside /.glusterfs/// with a single link could be removed (be careful to not do this when the volume has load. Otherwise find could incorrectly detect files that are still being created but have not fully completed).
As I already said in my comment, this method won't work fine if you also have symbolic links. The command only finds regular files with a single hardlink. Since gluster keeps a hardlink between the real file and the gfid in .glusterfs/xx/yy, any file inside .glusterfs with a single hardlink means that there's not real file associated.
The symbolic links inside .glusterfs may represent real symbolic link files or directories. To differentiate them is more complex.
Description of problem: One brick on one server is offline and all attempts to bring it back online have failed. The corresponding brick on the other (of a replica 2) server is ok. Other bricks are ok.
The following do not clear the problem:
The problem appears to be similar to https://github.com/gluster/glusterfs/issues/1531 but the cause is different, and the number of volumes and bricks is different. (I note the observation comment regarding "replica 2" and split-brain, but the cost (time/effort) to recover from split-brain is manageable and usually due to external causes, such as a power cut.)
My urgent need is to find a way out of the current situation and bring back online brick06 on the second server. Not so urgent is the need for gluster to handle this condition in a graceful way and report to the user/admin what is the real cause of the problem and how to fix it (if it cannot be fixed automatically).
The exact command to reproduce the issue: Not sure what actually caused this situation to arise, but activity at the time was: Multiple clients, all active, but with minimal activity. Intense activity from one client (actually one of the two gluster servers), scripted "chown" on over a million files which had been running for over 5 hours and was 83% complete. edit or "sed -i" on a 500MB script file (but should not have tipped over the 22GB Mem + 8 GB Swap)
The full output of the command that failed:
Expected results: Some way to bring that brick back online.
- The output of the
gluster volume info
command:- The operating system / glusterfs version: Fedora F32 Linux veriicon 5.8.15-201.fc32.x86_64 #1 SMP Thu Oct 15 15:56:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Linux verijolt 5.8.15-201.fc32.x86_64 #1 SMP Thu Oct 15 15:56:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux glusterfs 7.8
Additional info:
snippet from /var/log/messages
snippet from /var/log/glusterfs/bricks/srv-brick06.log