cea-hpc / robinhood

Robinhood Policy Engine : a versatile tool to monitor filesystem contents and schedule actions on filesystem entries.
http://robinhood.sf.net
Other
181 stars 62 forks source link

Missing attribute 'fullpath' for evaluating boolean expression #125

Open thiell opened 2 years ago

thiell commented 2 years ago

We've been investigating what could cause the following errors for existing files when using post_sched_match = auto_update; (the default):

2022/08/05 10:31:05 [30914/3] Policy | Missing attribute 'fullpath' for evaluating boolean expression on [0x2000520e0:0x2a62:0x0]
2022/08/05 10:31:05 [30914/3] Policy | [0x2000520e0:0x2a62:0x0]: attribute is missing for checking ignore_fileclass rule
2022/08/05 10:31:05 [30914/3] checkdv | Warning: cannot determine if entry  is whitelisted: skipping it.

This has also been reported by others on the mailing list I believe.

When running the policy, we have a few occurrences per minute, so it's not insignificant.

We checked the errors with multiple FIDs, and every time, the path is OK in the DB, like for this one:

MariaDB [robinhood_fir]> SELECT this_path(parent_id,name) FROM ENTRIES LEFT JOIN NAMES ON ENTRIES.id=NAMES.id WHERE ENTRIES.id='0x2000520e0:0x2a62:0x0';
+---------------------------------------------------------------------------------------------------+
| this_path(parent_id,name)                                                                         |
+---------------------------------------------------------------------------------------------------+
| 0x200000007:0x1:0x0/users/oparedes/Orca_Calc/Geom_opt/Arun_Kummar/A2/CL20_GeomOpt.proc6.orho0.tmp |
+---------------------------------------------------------------------------------------------------+
1 row in set (0.001 sec)

However, full logs show that the path is not resolved when the policy is run (example with another FID 0x2000523b3:0x1a:0x0):

2022/08/05 09:37:08 [28176/21] ListMgr | SQL query: SELECT id FROM ENTRIES WHERE id='0x2000523b3:0x1a:0x0'
2022/08/05 09:37:08 [28176/11] ListMgr | SQL query: COMMIT
2022/08/05 09:37:08 [28176/21] checkdv | requests: OK + in flight = 14
2022/08/05 09:37:08 [28176/26] checkdv | Checking if entry  matches policy rules (mode=auto_update_attrs)
2022/08/05 09:37:08 [28176/26] checkdv | Updating info about [0x2000523b3:0x1a:0x0]
2022/08/05 09:37:08 [28176/26] checkdv | Updating POSIX info of [0x2000523b3:0x1a:0x0]
2022/08/05 09:37:08 [28176/21] ListMgr | SQL query: SELECT size,last_mod,type,checkdv_lstchk,checkdv_out,parent_id,name,path_update,this_path(parent_id,name) FROM ENTRIES LEFT JOIN NAMES ON ENTRIES.id=NAMES.id WHERE ENTRIES.id='0x2000523b3:0x1b:0x0'
2022/08/05 09:37:08 [28176/26] Policy | Missing attribute 'fullpath' for evaluating boolean expression on [0x2000523b3:0x1a:0x0]
2022/08/05 09:37:08 [28176/26] Policy | [0x2000523b3:0x1a:0x0]: attribute is missing for checking ignore_fileclass rule
2022/08/05 09:37:08 [28176/26] checkdv | Warning: cannot determine if entry  is whitelisted: skipping it.

It looks like if we set post_sched_match = force_update;, the error goes away. It's our current workaround, but I'm worried it will slow down the policy (testing now...).

I'm suspecting an issue in src/policies/policy_run.c:check_entry(), when the path is updated:

2270     /* get fullpath or name, if they are needed to apply the policy */
2271     if (need_update(check_method, updt_mask.std &
2272                         (ATTR_MASK_fullpath | ATTR_MASK_name))) {
2273         DisplayLog(LVL_FULL, tag(policy), "Updating path info of "DFID,
2274                    PFID(&p_item->entry_id));
2275         switch (path_check_update(&p_item->entry_id, stat_path, new_attr_set,
2276                                   updt_mask)) {
2277         case PCR_UPDATED:
2278             updated = true;
2279             break;
2280 
2281         case PCR_NO_CHANGE:
2282             break;
2283 
2284         case PCR_ORPHAN:
2285             /* no path to access it, handle it as if it had been moved */
2286             return AS_MOVED;
2287         }
2288     }

Any idea? :) Thx

tl-cea commented 2 years ago

To help reproducing the issue, could you show what your policy look like? In particular, in which clause the path is matched? And also what command you execute for the policy run? Thx

thiell commented 2 years ago

Apologies for the delay. A policy basically looks either like this:

define_policy checkdv {
    status_manager = checker;
    scope { type == file }
    default_lru_sort_attr = none;
    # 'output' stands for previous value in DB
    # 7862400 = 90 days + 1 day grace
    default_action = cmd("/usr/sbin/rbh_checkdv /fir '{creation_time}' '{output}' 7862400 '{fid}'");
}

checkdv_rules {
    # ignore system files
    ignore_fileclass = system;

    rule default {
        condition { (checkdv.last_check == 0 or checkdv.output == "") and creation > 2d and
                    (ost_index == 0 or
                     ost_index == 1 or
                     ost_index == 2 or
                     ost_index == 3 or
                     ost_index == 4 or
                     ost_index == 5 or
                     ost_index == 6 or
                     ost_index == 7 or
                     ost_index == 8 or
                     ost_index == 9 or
                     ost_index == 10 or
                     ost_index == 11) }
    }
}

checkdv_parameters {
    db_result_size_max = 262144;
    queue_size = 65536;
    nb_threads = 8;
    reschedule_delay_ms = 0;
    recheck_ignored_entries = no;
    report_interval = 1min;
    pre_sched_match = none;
    post_sched_match = force_update;
}

...or like that:

define_policy checkdv {
    status_manager = checker;
    scope { type == file }
    default_lru_sort_attr = creation; #oldest first
    # 'output' stands for previous value in DB
    # 7862400 = 90 days + 1 day grace
    default_action = cmd("/usr/sbin/rbh_checkdv /fir '{creation_time}' '{output}' 7862400 '{fid}'");
}

checkdv_rules {
    # ignore system files
    ignore_fileclass = system;

    rule default {
        condition { checkdv.output != "" and creation > 90d and checkdv.last_check > 2d and
                    (ost_index == 0 or
                     ost_index == 1 or
                     ost_index == 2 or
                     ost_index == 3 or
                     ost_index == 4 or
                     ost_index == 5 or
                     ost_index == 6 or
                     ost_index == 7 or
                     ost_index == 8 or
                     ost_index == 9 or
                     ost_index == 10 or
                     ost_index == 11) }
    }
}

checkdv_parameters {
    db_result_size_max = 1000000000;  #no pagination? if not big enough we might never reach some entries
    queue_size = 131072;
    nb_threads = 8;
    reschedule_delay_ms = 0;
    recheck_ignored_entries = no;
    report_interval = 1min;
    pre_sched_match = none;
    post_sched_match = force_update;
}

So I believe fullpath is only used as part of this fileclass:

FileClass system {
    definition { tree == "/fir/.*" }
}

But TBH we have made quite some changes since I opened this ticket so these differ a little bit from the original policy. I will try to test a little bit more when possible. But in any case, it seems to work fine with post_sched_match = force_update;.