cea-hpc / robinhood

Robinhood Policy Engine : a versatile tool to monitor filesystem contents and schedule actions on filesystem entries.
http://robinhood.sf.net
Other
181 stars 62 forks source link

Moving files on lustre can result in multiple listings in robinhood #129

Open CanWood opened 1 year ago

CanWood commented 1 year ago

Hey folks,

We're just testing out robinhood and hitting a minor issue. If I move a small file on a lustre file system, all is well. I've tried moving a few big files but they end up leaving a duplicate listing in the rbh database.

Here's a lightly modfied illustration (paths, users and groups renamed)

[root@hpcmon01 merged]# rbh-find /mnt/lustre/working/users/CanWood/ -name reads_part1.tar -ls
Using config file '/etc/robinhood.d/lustrefs.conf'.
[0x20002b0a7:0x12715:0x0] file rw-r--r--   1  CanWood     OurTeam   1362109409280  2022/01/22 12:30:23 /mnt/lustre/working/users/CanWood/reads_part1.tar
[0x20002b0a7:0x12715:0x0] file rw-r--r--   1  CanWood     OurTeam   1362109409280  2022/01/22 12:30:23 /mnt/lustre/working/users/CanWood/subfolder/reads_part1.tar
[root@hpcmon01 merged]# rbh-find /mnt/lustre/working/users/CanWood/ -name reads_part1.tar -exec "ls {} -al"
Using config file '/etc/robinhood.d/lustrefs.conf'.
-rw-r--r-- 1 CanWood OurTeam 1362109409280 Jan 22  2022 /mnt/lustre/working/users/CanWood/reads_part1.tar
ls: cannot access /mnt/lustre/working/users/CanWood/subfolder/reads_part1.tar: No such file or directory
Our environment:
Lustre servers are Centos7.9 running lustre 2.15.1
Robinhood server is CentOS7.9 running lustre-client 2/15.2

[root@hpcmon01 merged]# rbh-find --version

Product:         robinhood 'find' command
Version:         3.1.7-1
Build:           2023-02-15 03:54:40

Compilation switches:
    Address entries by FID
    MDT Changelogs supported

Lustre Version: 2.15
Database binding: MySQL

Report bugs to: <robinhood-support@lists.sourceforge.net>

In case it's an issue with our changelog settings on the MDS, here they are:

[root@hpcmds02 ~]# lctl get_param mdd.*.changelog*
mdd.lustrefs-MDT0000.changelog_deniednext=60
mdd.lustrefs-MDT0000.changelog_gc=1
mdd.lustrefs-MDT0000.changelog_max_idle_indexes=2097446912
mdd.lustrefs-MDT0000.changelog_max_idle_time=2592000
mdd.lustrefs-MDT0000.changelog_min_free_cat_entries=2
mdd.lustrefs-MDT0000.changelog_min_gc_interval=3600
mdd.lustrefs-MDT0000.changelog_size=4528576
mdd.lustrefs-MDT0000.changelog_current_mask=
MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC SATTR XATTR HSM MTIME CTIME MIGRT FLRW RESYNC 
mdd.lustrefs-MDT0000.changelog_mask=
MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC SATTR XATTR HSM MTIME CTIME MIGRT FLRW RESYNC 
mdd.lustrefs-MDT0000.changelog_users=
current_index: 96382622
ID                            index (idle) mask
cl1                        96382618 (0)

Rather than flood the ticket, I'll leave it at that and ask if there are any other troubleshooting steps folks can provide.

Thanks!