cea-hpc / robinhood

Robinhood Policy Engine : a versatile tool to monitor filesystem contents and schedule actions on filesystem entries.
http://robinhood.sf.net
Other
181 stars 62 forks source link

robinhood v3: rbh-undelete: state inconsistent when interrupted with ^C #69

Open geraldhofer opened 7 years ago

geraldhofer commented 7 years ago

Restoring several test files and interrupting that with ^C:

[root@tclin01 siteAB]# rbh-undelete -R /gdatat1/siteAB/f\*
[...]
2017/02/25 08:29:47 [4312/1] lhsm | Rebinding [0x2000013a0:0xfc:0x0] to [0x200001b7c:0x1f817:0x0] in archive
2017/02/25 08:29:47 [4312/1] lhsm | Executing rebind command: ssh tdmfha /root/lhsmtool_dmf_rebase 0x2000013a0:0xfc:0x0 0x200001b7c:0x1f817:0x0 /mnt/hsm_gdatat1_3
Restoring '/gdatat1/siteAB/fallocate'...         restore OK (file)
        Entry successfully updated in the dabatase
2017/02/25 08:29:47 [4312/1] lhsm | Rebinding [0x2000013a0:0xfd:0x0] to [0x200001b7c:0x1f818:0x0] in archive
2017/02/25 08:29:47 [4312/1] lhsm | Executing rebind command: ssh tdmfha /root/lhsmtool_dmf_rebase 0x2000013a0:0xfd:0x0 0x200001b7c:0x1f818:0x0 /mnt/hsm_gdatat1_3
Restoring '/gdatat1/siteAB/false'...     restore OK (file)
        Entry successfully updated in the dabatase
2017/02/25 08:29:48 [4312/1] lhsm | Rebinding [0x2000013a0:0xfe:0x0] to [0x200001b7c:0x1f819:0x0] in archive
2017/02/25 08:29:48 [4312/1] lhsm | Executing rebind command: ssh tdmfha /root/lhsmtool_dmf_rebase 0x2000013a0:0xfe:0x0 0x200001b7c:0x1f819:0x0 /mnt/hsm_gdatat1_3
^C

The expectation would be that I can rerun the command to restore the remaining files:

[root@tclin01 siteAB]# rbh-undelete -R /gdatat1/siteAB/f\*
Using config file '/etc/robinhood.d/gdatat1.conf'.
2017/02/25 08:29:55 [4389/1] CheckFS | '/gdatat1' matches mount point '/gdatat1', type=lustre, fs=192.168.1.11@o2ib:/gdatat1
2017/02/25 08:29:55 [4389/1] llapi | error on ioctl 0x4008669a for '/gdatat1/siteAB/fc' (4): stripe already set
2017/02/25 08:29:55 [4389/1] llapi | cannot create '/gdatat1/siteAB/fc' for import
2017/02/25 08:29:55 [4389/1] lhsm | Failed to import file '/gdatat1/siteAB/fc': File exists
Restoring '/gdatat1/siteAB/fc'...        ERROR
2017/02/25 08:29:55 [4389/1] lhsm | Rebinding [0x2000013a0:0xff:0x0] to [0x200001b7c:0x1f81b:0x0] in archive
2017/02/25 08:29:55 [4389/1] lhsm | Executing rebind command: ssh tdmfha /root/lhsmtool_dmf_rebase 0x2000013a0:0xff:0x0 0x200001b7c:0x1f81b:0x0 /mnt/hsm_gdatat1_3
Restoring '/gdatat1/siteAB/fc-cache'...  restore OK (file)

But that file is still in the database:

[root@tclin01 siteAB]# rbh-undelete -L /gdatat1/siteAB/fc
Using config file '/etc/robinhood.d/gdatat1.conf'.
2017/02/25 08:38:57 [5933/1] CheckFS | '/gdatat1' matches mount point '/gdatat1', type=lustre, fs=192.168.1.11@o2ib:/gdatat1
             rm_time,                        id,     type,       user,      group,       size,             last_mod,     lhsm.status,                                     path
 2017/02/25 08:28:36,    [0x2000013a0:0xfe:0x0],     file,       root,       root,         26,  2017/02/17 08:28:11,         synchro,                       /gdatat1/siteAB/fc

The file does exist, but the rebinding was interrupted.

[root@tclin01 siteAB]# lfs hsm_state fc
fc: (0x0000000d) released exists archived, archive_id:3
[root@tclin01 siteAB]# lfs path2fid fc
[0x200001b7c:0x1f819:0x0]
[root@tclin01 siteAB]# md5sum fc
md5sum: fc: No data available

And the recovery becomes a bit messy and still leaves a database entry behind:

[root@tclin01 siteAB]# rm fc
rm: remove regular file ‘fc’? y
[root@tclin01 siteAB]# rbh-undelete -R /gdatat1/siteAB/fc
Using config file '/etc/robinhood.d/gdatat1.conf'.
2017/02/25 08:47:35 [7289/1] CheckFS | '/gdatat1' matches mount point '/gdatat1', type=lustre, fs=192.168.1.11@o2ib:/gdatat1
2017/02/25 08:47:35 [7289/1] lhsm | Rebinding [0x2000013a0:0xfe:0x0] to [0x200001b7c:0x1f842:0x0] in archive
2017/02/25 08:47:35 [7289/1] lhsm | Executing rebind command: ssh tdmfha /root/lhsmtool_dmf_rebase 0x2000013a0:0xfe:0x0 0x200001b7c:0x1f842:0x0 /mnt/hsm_gdatat1_3
Restoring '/gdatat1/siteAB/fc'...        restore OK (file)
        Entry successfully updated in the dabatase
2017/02/25 08:47:35 [7289/1] llapi | error on ioctl 0x4008669a for '/gdatat1/siteAB/fc' (4): stripe already set
2017/02/25 08:47:35 [7289/1] llapi | cannot create '/gdatat1/siteAB/fc' for import
2017/02/25 08:47:35 [7289/1] lhsm | Failed to import file '/gdatat1/siteAB/fc': File exists
Restoring '/gdatat1/siteAB/fc'...        ERROR

undelete summary:
                1 files
                0 old version
                0 empty files
                0 non-files
                0 no backup
                1 errors
                0 DB errors
[root@tclin01 siteAB]# lfs path2fid fc
[0x200001b7c:0x1f842:0x0]
[root@tclin01 siteAB]# md5sum fc
8e669909911c82eb607bbd8d45178092  fc
[root@tclin01 siteAB]# rbh-undelete -L /gdatat1/siteAB/fc
Using config file '/etc/robinhood.d/gdatat1.conf'.
2017/02/25 08:50:52 [7891/1] CheckFS | '/gdatat1' matches mount point '/gdatat1', type=lustre, fs=192.168.1.11@o2ib:/gdatat1
             rm_time,                        id,     type,       user,      group,       size,             last_mod,     lhsm.status,                                     path
 2017/02/25 08:47:29, [0x200001b7c:0x1f819:0x0],     file,       root,       root,         26,  2017/02/17 08:28:11,        released,                       /gdatat1/siteAB/fc
ypsah commented 7 years ago

The first inconsistency happens because robinhood first calls llapi_hsm_import() and then rebinds the old fid to the new one. If the robinhood process gets killed before the rebinding happens (or if the rebind fails), the file is still created, in state released, and cannot be restored because its fid does not point to the data archived (hence the md5sum: fc: No data available).

The "recovery becomes a bit messy" because when you use:

[root@tclin01 siteAB]# rm fc
rm: remove regular file ‘fc’? y

It adds another entry in the SOFT_RM table (the one robinhood uses to keep track of the files that can be undeleted). So when you run rbh-undelete once again, the first entry is successfully undeleted, and the next one fails because it has the same name. Note that the fid's entry that rbh-undelete reports is not the same between the first call, and the second one.

Now things can get even worse (meaning you can't undelete anything) if the the robinhood process is killed before it updates its SOFT_RM table and after the rebind command completes. If this happens, the first entry in the SOFT_RM table cannot be restored because the rebind will fail, and llapi_hsm_import() will fail for the following entries with file exists.

geraldhofer commented 7 years ago

I do understand the implications of what happens - the bug from my perspective is that some operations in the rbh-undelete need to be atomic and a ^C need to be prevented until all operations that are finished or rolled back.

Another related scenario is when the rebind command returns an error - for example if the backend is down:

[root@tclin01 bin]# rm red 
rm: remove regular file ‘red’? y
[root@tclin01 bin]# rbh-undelete -L ./red
Using config file '/etc/robinhood.d/gdatat1.conf'.
2017/02/28 09:00:35 [23859/1] CheckFS | '/gdatat1' matches mount point '/gdatat1', type=lustre, fs=192.168.1.11@o2ib:/gdatat1
             rm_time,                        id,     type,       user,      group,       size,             last_mod,     lhsm.status,                                     path
 2017/02/28 09:00:30, [0x200001b7e:0x182f5:0x0],     file,       root,       root,         89,  2014/06/10 13:54:12,         synchro,   /gdatat1/siteAB/root5/root/usr/bin/red
[root@tclin01 bin]# rbh-undelete -R ./red
Using config file '/etc/robinhood.d/gdatat1.conf'.
2017/02/28 09:00:45 [23872/1] CheckFS | '/gdatat1' matches mount point '/gdatat1', type=lustre, fs=192.168.1.11@o2ib:/gdatat1
2017/02/28 09:00:45 [23872/1] lhsm | Rebinding [0x200001b7e:0x182f5:0x0] to [0x200002340:0x98bd:0x0] in archive
2017/02/28 09:00:45 [23872/1] lhsm | Executing rebind command: ssh tdmfha /root/lhsmtool_dmf_rebase 0x200001b7e:0x182f5:0x0 0x200002340:0x98bd:0x0 /mnt/hsm_gdatat1_3
2017/02/28 09:00:46 [23872/1] lhsm | Failed to rebind entry in backend: command failed
Restoring '/gdatat1/siteAB/root5/root/usr/bin/red'...    ERROR

undelete summary:
                0 files
                0 old version
                0 empty files
                0 non-files
                0 no backup
                1 errors
                0 DB errors

I then end up with the new stub that is not rebound on the backend:

[root@tclin01 bin]# ls -la red
-rwxr-xr-x. 1 root root 89 Jun 10  2014 red
[root@tclin01 bin]# lfs hsm_state red
red: (0x0000000d) released exists archived, archive_id:3
[root@tclin01 bin]# lfs path2fid red
[0x200002340:0x98bd:0x0]
[root@tclin01 bin]# rbh-find -ls ./red
Using config file '/etc/robinhood.d/gdatat1.conf'.
[0x200002340:0x98bd:0x0] file rwxr-xr-x   1  root       root                    89  2014/06/10 13:54:12 ./red
[root@tclin01 bin]# rbh-undelete -L ./red
Using config file '/etc/robinhood.d/gdatat1.conf'.
2017/02/28 09:08:57 [25329/1] CheckFS | '/gdatat1' matches mount point '/gdatat1', type=lustre, fs=192.168.1.11@o2ib:/gdatat1
             rm_time,                        id,     type,       user,      group,       size,             last_mod,     lhsm.status,                                     path
 2017/02/28 09:00:30, [0x200001b7e:0x182f5:0x0],     file,       root,       root,         89,  2014/06/10 13:54:12,         synchro,   /gdatat1/siteAB/root5/root/usr/bin/red
[root@tclin01 bin]# md5sum red
md5sum: red: No data available

But removing that stub creates another entry in the SOFT_RM table and again we might not be able to restore the file:

[root@tclin01 bin]# rm red
rm: remove regular file ‘red’? y
[root@tclin01 bin]# rbh-undelete -L ./red
Using config file '/etc/robinhood.d/gdatat1.conf'.
2017/02/28 09:13:21 [26000/1] CheckFS | '/gdatat1' matches mount point '/gdatat1', type=lustre, fs=192.168.1.11@o2ib:/gdatat1
             rm_time,                        id,     type,       user,      group,       size,             last_mod,     lhsm.status,                                     path
 2017/02/28 09:00:30, [0x200001b7e:0x182f5:0x0],     file,       root,       root,         89,  2014/06/10 13:54:12,         synchro,   /gdatat1/siteAB/root5/root/usr/bin/red
 2017/02/28 09:13:19,  [0x200002340:0x98bd:0x0],     file,       root,       root,         89,  2014/06/10 13:54:12,        released,   /gdatat1/siteAB/root5/root/usr/bin/red
[root@tclin01 bin]# rbh-undelete -R ./red
Using config file '/etc/robinhood.d/gdatat1.conf'.
2017/02/28 09:13:48 [26030/1] CheckFS | '/gdatat1' matches mount point '/gdatat1', type=lustre, fs=192.168.1.11@o2ib:/gdatat1
2017/02/28 09:13:48 [26030/1] lhsm | Rebinding [0x200001b7e:0x182f5:0x0] to [0x200002340:0x98be:0x0] in archive
2017/02/28 09:13:48 [26030/1] lhsm | Executing rebind command: ssh tdmfha /root/lhsmtool_dmf_rebase 0x200001b7e:0x182f5:0x0 0x200002340:0x98be:0x0 /mnt/hsm_gdatat1_3
2017/02/28 09:13:48 [26030/1] lhsm | Failed to rebind entry in backend: command failed
Restoring '/gdatat1/siteAB/root5/root/usr/bin/red'...    ERROR
2017/02/28 09:13:48 [26030/1] llapi | error on ioctl 0x4008669a for '/gdatat1/siteAB/root5/root/usr/bin/red' (4): stripe already set
2017/02/28 09:13:48 [26030/1] llapi | cannot create '/gdatat1/siteAB/root5/root/usr/bin/red' for import
2017/02/28 09:13:48 [26030/1] lhsm | Failed to import file '/gdatat1/siteAB/root5/root/usr/bin/red': File exists
Restoring '/gdatat1/siteAB/root5/root/usr/bin/red'...    ERROR

undelete summary:
                0 files
                0 old version
                0 empty files
                0 non-files
                0 no backup
                2 errors
                0 DB errors

So these are all scenarios where something breaks in the middle of the operations. If the rebind command fails, rolling back the generations of the stub is probably the easiest solution.

mkgilbert commented 7 years ago

@qb-cea Regarding your comment on the command

[root@tclin01 siteAB]# rm fc
rm: remove regular file ‘fc’? y

and what happens in robinhood in the scenario when it has already been archived, I'm a little confused. Are you saying that it is expected behavior for Robinhood to create a new file in the backend hsm without updating the link that points to it?

I'm noticing that if I run rbh-undelete on an archived file that has been deleted from lustre, a new hsm copy is made, but this breaks the symbolic link that was originally created in the hsm/shadow directory. So from that point on, the backend can't restore the file if it is deleted yet again. Is this expected? This feels like it's a critical issue to me...unless I'm simply misunderstanding the purpose of the command.

ypsah commented 7 years ago

[Is it] expected behavior for Robinhood to create a new file in the backend hsm without updating the link that points to it?

No, this is Lustre specific. Actually, it is lhsmtool_posix specific (lhsmtool_posix is the copytool packaged with Lustre's sources). Robinhood is only expected to create a new file in the filesystem it monitors. In Lustre, each file is assigned a unique fid (something that looks like this 0x200001b7e:0x182f5:0x0). When an archived file is deleted from Lustre and Robinhood undeletes it, a new file has to be created (in Lustre). Consequently, the new file's fid will be different from the original file's. Robinhood then has to use a rebind command to link the archived data (in the backend) to the newly created file. If I remember correctly, lhsmtool_posix uses Lustre's fid to compute the path and name of the archived data, so the rebind command essentially consist of a mv (or a cp, I don't know exactly) from the old path to the new one.

I'm noticing that if I run rbh-undelete on an archived file that has been deleted from lustre, a new hsm copy is made, but this breaks the symbolic link that was originally created in the hsm/shadow directory.

I am not familiar enough with how lhsmtool_posix handles file archiving nor with how exactly it uses hsm/shadow to draw any conclusion. Yet, I can tell you the way data is organized in the backend is highly correlated to which copytool you use. I hope my previous answer also addresses this issue, but if you think you found an issue with lhsmtool_posix, feel free to open a ticket about it.