Open dalwise opened 2 years ago
Hi @dalwise. I completely missed your previous update. I'm sorry.
Can you run this script on hostnam20 ?
#!/bin/bash
set -eEu
BRICK="${1:?You must pass the path to the root of the brick}"
BRICK="$(realpath "${BRICK}")"
if [[ ! -d "${BRICK}/.glusterfs" ]]; then
echo "'${BRICK}' doesn't seem to contain a brick" >&2
exit 1
fi
declare -A GFIDS
function resolve() {
local gfid="${1}"
local link ref
link="${GFIDS[${gfid}]-}"
if [[ -z "${link}" ]]; then
echo "${gfid} doesn't exist" >&2
GFIDS[${gfid}]="<missing>/"
elif [[ "${link}" == "../../.." ]]; then
GFIDS[${gfid}]="/"
elif [[ "${link:0:6}" == "../../" ]]; then
ref="${link:12:36}"
resolve "${ref}"
GFIDS[${gfid}]="${GFIDS[${ref}]}${link:49}/"
fi
}
while read gfid link; do
GFIDS[${gfid}]="${link}"
done < <(find "${BRICK}/.glusterfs" -type l -links 1 -printf "%f %l\n")
for gfid in "${!GFIDS[@]}"; do
resolve "${gfid}"
done
len="${#BRICK}"
while read gfid path; do
gfid="${gfid:0:8}-${gfid:8:4}-${gfid:12:4}-${gfid:16:4}-${gfid:20}"
path="${path:${len}}"
if [[ -z "${GFIDS[${gfid}]-}" ]]; then
echo "Directory without GFID (${gfid}): '${path}'" >&2
else
if [[ "${GFIDS[${gfid}]}" != "${path}" ]]; then
echo "Mismatching directory (${gfid}): '${path}' <-> '${GFIDS[${gfid}]}'" >&2
fi
unset GFIDS[${gfid}]
fi
done < <(find "${BRICK}" -path "${BRICK}/.glusterfs" -prune -o -type d -exec getfattr -e hex -n trusted.gfid --absolute-names {} \; |
sed -n '/^#\s*file\s*:/{N;s/^#\s*file\s*:\s*\(.*\)\ntrusted\.gfid\s*=\s*0x\(.*\)/\2 \1\//p}')
for gfid in "${!GFIDS[@]}"; do
echo "Orphan GFID (${gfid}): '${GFIDS[${gfid}]}'" >&2
done
To run it, just pass the root directory of the brick. It will check if all directories are correctly defined. The assertion could be caused by a directory without its corresponding gfid.
Thanks for the feedback @xhernandez!
The script did find many issues needing correction. I stored the output on check_gluster_dirs.log
, which has 345235 lines:
[hostname20 ~]$ wc -l check_gluster_dirs.log
345235 check_gluster_dirs.log
They fall into the following categories:
[hostname20 ~]$ grep -c "doesn't exist" check_gluster_dirs.log
1550
[hostname20 ~]$ grep -c "Directory without GFID" check_gluster_dirs.log
339250
[hostname20 ~]$ grep -c "Mismatching directory" check_gluster_dirs.log
4435
[hostname20 ~]$
Is there any automated way to fix these?
Best regards, Daniel
I wasn't expecting so many errors. Can you run the script on another brick that should be ok to be sure that there isn't any bug in the script ?
You can also select some of the errors and manually verify that they are correct. If data is fine, then this is what you should do:
First of all you should disable self-heal to prevent unexpected interferences while you are touching backend contents, specially with so many errors:
# gluster volume set <volname> self-heal-daemon off
Then check the errors:
<missing>
), then they are just a side effect, otherwise we need to check them."Directory without GFID" errors can be fixed in two ways:
ln -s ../../${parent_gfid:0:2}/${parent_gfid:2:2}/${parent_gfid}/${dir_name} .glusterfs/${gfid:0:2}/${gfid:2:2}/${gfid}
"parent_gfid" is the GFID of the parent directory. "dir_name" is the base name of the directory, and "gfid" is the GFID of the directory.
Both option can be automated with a script if necessary.
If you fix all the issues, run the script again to verify that everything is correct before restarting self-heal.
There's another possibility: given that the number of missing entries is huge compared to the existing ones, and that half of the existing ones are already damaged, maybe it will be easier to just remove everything from the arbiter brick and do a full heal. If I'm not wrong this is basically what you already did at the beginning, so arbiter brick should be healthy. Since that's not the case, before doing anything verify that the other two bricks don't have any issue with directories (running the script) that could cause issues with self-heal.
If you decide to go this way, let me know to tell you exactly what to remove and how to start self-heal.
Hi @xhernandez ,
There's another possibility [...] before doing anything verify that the other two bricks don't have any issue with directories (running the script) that could cause issues with self-heal.
I have been running the script on hostname21 & hostname22 to evaluate the extent of the problem. Running on hostname21 took over a day and yielded:
[hostname21 ~]$ wc -l check_gluster_dirs.log
29479698 check_gluster_dirs.log
[hostname21 ~]$ grep -c "Directory without GFID" check_gluster_dirs.log
29479692
[hostname21 ~]$ grep -c "Orphan GFID" check_gluster_dirs.log
6
The run on hostname22 is still ongoing and has 26354180 lines so far. I'll report back when it completes, but it does look like there are considerable inconsistencies on all bricks.
Best regards
Can you provide some examples of these inconsistencies ? also provide stat
and getfattr -m. -e hex -d -h
of them, if possible.
Sure!
hostname22 completed running the script you provided and has the same issues found as hostname21.
Here are some samples of directories without GFID in hostname21 & hostname22:
Directory without GFID (daa23852-c221-4c5f-803b-3e3d5678046e): '/logs/TWHT192909365/J2882T706383/'
Directory without GFID (1b8ce690-2eed-4a60-a49b-eb9a5f98cf35): '/logs/TWHT192907399/J195T22180/'
Directory without GFID (672e7b8d-d227-4198-9965-378faafd291c): '/logs/TWHT192907399/J195T30029/'
Directory without GFID (c585c9b7-b551-4135-b537-e32bc9ec9a94): '/logs/TWHT192907399/J1292T258174/'
Directory without GFID (2c12b41f-9b88-4d4c-b2c6-d19b3fd2b703): '/logs/TWHT192805307/J172T30005/'
Directory without GFID (7e59d86d-e004-44c7-b923-3fe888517bb8): '/logs/TWHT192801241/J178T22865/'
Directory without GFID (4acfb407-d5fc-4ae4-bb73-f37eee3d26d4): '/logs/TWHT192801241/J178T22953/'
Directory without GFID (98cfad02-8d42-4cfd-b504-a6c59e128eb4): '/logs/TWHT192803296/J262T33613/'
Directory without GFID (74c751f8-e8f4-41e7-aea8-039d276a7f32): '/logs/TWHT192803296/J262T35029/'
Directory without GFID (38e1167a-154b-4a72-aa7f-8675a8f95d52): '/logs/TWHT192812148/J3689T881252/'
For the first item in the list above these are the stat and getfattr results on each node:
[hostname20 .brick]# stat logs/TWHT192909365/J2882T706383/
stat: cannot stat ‘logs/TWHT192909365/J2882T706383/’: No such file or directory
[hostname21 .brick]$ stat logs/TWHT192909365/J2882T706383/
File: ‘logs/TWHT192909365/J2882T706383/’
Size: 130 Blocks: 0 IO Block: 4096 directory
Device: fd00h/64768d Inode: 677676032 Links: 2
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-05-10 18:58:19.266364954 +0000
Modify: 2019-11-17 02:23:14.332844902 +0000
Change: 2022-02-02 19:57:17.643823391 +0000
Birth: -
[hostname21 .brick]$ getfattr -m. -e hex -d -h logs/TWHT192909365/J2882T706383/
# file: logs/TWHT192909365/J2882T706383/
trusted.afr.vol_name-client-2=0x000000000000000000000000
trusted.gfid=0xdaa23852c2214c5f803b3e3d5678046e
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000
trusted.glusterfs.mdata=0x010000000000000000000000005dd0af120000000013d6cf66000000005dd0af120000000013d6cf66000000005eebc45c00000000286deaaf
[hostname22 .brick]$ stat logs/TWHT192909365/J2882T706383/
File: ‘logs/TWHT192909365/J2882T706383/’
Size: 130 Blocks: 8 IO Block: 4096 directory
Device: fd00h/64768d Inode: 677681199 Links: 2
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-05-11 01:47:35.786147664 +0000
Modify: 2019-11-17 02:23:14.330110441 +0000
Change: 2022-02-07 20:16:36.321719075 +0000
Birth: -
[hostname22 .brick]$ getfattr -m. -e hex -d -h logs/TWHT192909365/J2882T706383/
# file: logs/TWHT192909365/J2882T706383/
trusted.afr.vol_name-client-0=0x000000000000000100000001
trusted.afr.vol_name-client-2=0x000000000000000000000000
trusted.gfid=0xdaa23852c2214c5f803b3e3d5678046e
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000
trusted.glusterfs.mdata=0x010000000000000000000000005dd0af120000000013d6cf66000000005dd0af120000000013d6cf66000000005eebc45c00000000286deaaf
[hostname22 .brick]$
@dalwise can you check if .glusterfs/da/a2/daa23852-c221-4c5f-803b-3e3d5678046e
exists on any brick and run stat
on it if so ?
The entry on hostname22 indicates pending changes on another brick, is there nothing in gluster volume heal <volname> info
?
Can you also check the contents of .glusterfs/indices/*
(there should be 3 subdirectories) on all bricks ? if there's something, is daa23852-c221-4c5f-803b-3e3d5678046e
there ?
@dalwise can you check if .glusterfs/da/a2/daa23852-c221-4c5f-803b-3e3d5678046e exists on any brick and run stat on it if so ?
It doesn't exist on any of the 3 bricks:
[hostname20 .brick]$ ls /shared/.brick/.glusterfs/da/a2/daa23852-c221-4c5f-803b-3e3d5678046e
ls: cannot access /shared/.brick/.glusterfs/da/a2/daa23852-c221-4c5f-803b-3e3d5678046e: No such file or directory
[hostname21 .brick]$ ls /shared/.brick/.glusterfs/da/a2/daa23852-c221-4c5f-803b-3e3d5678046e
ls: cannot access /shared/.brick/.glusterfs/da/a2/daa23852-c221-4c5f-803b-3e3d5678046e: No such file or directory
[hostname22 .brick]$ ls /shared/.brick/.glusterfs/da/a2/daa23852-c221-4c5f-803b-3e3d5678046e
ls: cannot access /shared/.brick/.glusterfs/da/a2/daa23852-c221-4c5f-803b-3e3d5678046e: No such file or directory
The entry on hostname22 indicates pending changes on another brick, is there nothing in gluster volume heal
info ?
There isn't:
[hostname20 .brick]$ gluster volume heal vol_name info
Brick hostname21:/shared/.brick
Status: Connected
Number of entries: 0
Brick hostname22:/shared/.brick
Status: Connected
Number of entries: 0
Brick hostname20:/shared/.brick
Status: Connected
Number of entries: 0
Can you also check the contents of .glusterfs/indices/* (there should be 3 subdirectories) on all bricks ? if there's something, is daa23852-c221-4c5f-803b-3e3d5678046e there ?
It's not there:
[hostname20 .brick]$ ls .glusterfs/indices/*
.glusterfs/indices/dirty:
dirty-07fcd31e-4e46-48af-ac73-7609ea647fde
.glusterfs/indices/entry-changes:
.glusterfs/indices/xattrop:
xattrop-07fcd31e-4e46-48af-ac73-7609ea647fde
[hostname21 .brick]$ ls .glusterfs/indices/*
.glusterfs/indices/dirty:
dirty-f739c46e-c0b8-4dd4-9a28-315c63aa7b81
.glusterfs/indices/entry-changes:
.glusterfs/indices/xattrop:
xattrop-f739c46e-c0b8-4dd4-9a28-315c63aa7b81
[hostname21 .brick]$
[hostname22 .brick]$ ls .glusterfs/indices/*
.glusterfs/indices/dirty:
dirty-4f6e0bfc-c48a-47cf-a9eb-eae609802bd7
.glusterfs/indices/entry-changes:
.glusterfs/indices/xattrop:
xattrop-4f6e0bfc-c48a-47cf-a9eb-eae609802bd7
[hostname22 .brick]$
Thank you very much for all your help.
I'm preparing a tool to do a full check of the bricks, I'll need some more time...
Thank you again for your help on this!
Hi @dalwise. I'm very sorry for the delay. I was trying to create a generic tool that could read the data as fast as possible, but its complexity and the other work I have makes it hard. I'll provide a simpler tool soon.
Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.
Hi @xhernandez, we still have not been able to put the systems that had this issue back into production. Do you have any updates on the tool you had mentioned to do a full check of the bricks?
Thank you very much!
Description of problem: Heal count keeps increasing. Current status after 13 days of increases is:
The exact command to reproduce the issue: The issue happened after trying to recover from an inaccessible directory (
/shared/vol_name/logs
) within the volume mounted at/shared/vol_name
. Trying to access this directory would return "Transport endpoint not connected" on all clients. Other directories in the mounted volume were not affected.gluster volume heal
showed GFIDs in need of healing and the problematic directory in split brain. We were not able to get the files healed by using the gluster heal commands. We were then able to resolve problems with most GFIDs by removing the corresponding files in the bricks. However one GFID remained in need of healing on host_name20 and we could not determine what file it corresponded to. Since host_name20 just had the arbiter brick we tried removing it:That allowed us to access the directory that we could not see earlier. We then attempted to rejoin the arbiter with a clean brick:
The directory is still accessible, but the number of files in need of healing has been increasing for the last 13 days.
We can likely recover by simply backing up the files, destroying the current volume and then moving the files onto a newly created volume. However we are at a loss as to:
The full output of the command that failed: (heal count command above)
Expected results: Heal count to return to 0.
Mandatory info: - The output of the
gluster volume info
command:- The output of the
gluster volume status
command:- The output of the
gluster volume heal
command: (Sharedheal count
above asheal
contains tens of millions of entries)- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/
https://www.dropbox.com/s/p93wyyztj5bmzk8/logs.tgz
This compressed log file contains the logs for all three server nodes. The server nodes have the volume mounted and so are acting as clients also.
The volume name is the name of an internal project and has been changed to "vol_name" in command outputs and logs. The hostnames are also internal and have been changed to host_name20, host_name21 & host_name22.
**- Is there any crash ? Provide the backtrace and coredump No crash
Additional info:
- The operating system / glusterfs version:
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration