cea-hpc / robinhood

Robinhood Policy Engine : a versatile tool to monitor filesystem contents and schedule actions on filesystem entries.
http://robinhood.sf.net
Other
177 stars 60 forks source link

`report = no` in fileclass definition and SQL filtering #124

Open kcgthb opened 2 years ago

kcgthb commented 2 years ago

HI!

We noticed that having report = no in a fileclass definition changes the SQL query generated for policy rules that use this fileclass as target_fileclass.

Here'a an example, with the following policy definition:

define_policy test {
    default_action = none;
    scope { type == file }
    status_manager = none;
    default_lru_sort_attr = none;
}

fileclass test_fc {
    definition { type == "symlink" }
}

test_rules {
    rule test {
        target_fileclass = test_fc;
        condition { owner == 'kilian' }
    }
    rule default {
        condition { owner == 'kilian' }
    }
}

test_trigger {
    trigger_on = periodic;
    check_interval = 1min;
}

The generated SQL query is:

SELECT ENTRIES.id AS id FROM ENTRIES 
WHERE 
  ENTRIES.type='file' AND 
  (ENTRIES.invalid=0 OR ENTRIES.invalid IS NULL) AND 
  ( ( (ENTRIES.uid='kilian' OR ENTRIES.uid IS NULL) AND ENTRIES.fileclass LIKE BINARY '%+test_fc+%') OR 
    ( (ENTRIES.uid='kilian' OR ENTRIES.uid IS NULL) ) ) 
  LIMIT 100000

We can see that the SQL query will only return entries matching the fileclass definition, with:

WHERE ... ENTRIES.fileclass LIKE BINARY '%+test_fc+%'

But if we add report = no to the fileclass definition (which according to the documentation, should only hide that fileclass from rbh-report --classinfo report), like this:

fileclass test_fc {
    definition { type == "symlink" }
    report = no;
}

we can see that the fileclass filter doesn't exist anymore in the generated SQL query:

SELECT ENTRIES.id AS id FROM ENTRIES 
WHERE
  ENTRIES.type='file' AND 
  (ENTRIES.invalid=0 OR ENTRIES.invalid IS NULL) AND 
  ( ( (ENTRIES.uid='kilian' OR ENTRIES.uid IS NULL) ) OR 
    ( (ENTRIES.uid='kilian' OR ENTRIES.uid IS NULL) ) ) 
LIMIT 100000

As a consequence, the resulting file list contains many unwanted entries that are processed unnecessarily.

Is that expected?

tl-cea commented 2 years ago

Thank you kilian for the precise report. "report = yes/no" is actually not well named. If enabled, this parameter results in storing the fileclass name in the DB so it is available for further matching and reporting. When "report = no", the fileclass definition is actually "inlined" (replaced by its definition) where the fileclass name is used. It should however be replaced in the SQL request with ENTRIES.type == "symlink".

Notice that in your case this would be a non-sense as the scope of the policy is "type == file".

Best regards, Thomas

kcgthb commented 2 years ago

Aah I see, thanks for the reply @tl-cea !

But as you mentioned, the generated SQL query (with report = no) should indeed contain WHERE ENTRIES.type == "symlink". And it doesn't seem to be the case.

To make things easier and remove the file/symlink thing from the equation, I did another test with this:

fileclass test_fc {
    definition { group == "foo" }
    report 
}

test_rules {
    rule test {
        target_fileclass = test_fc;
        condition { owner == 'user1' }
    }
    rule default {
        condition { owner == 'user2' }
    }
}

And got the following query:

SELECT ENTRIES.id AS id FROM ENTRIES 
WHERE
  ENTRIES.type='file' AND 
  (ENTRIES.invalid=0 OR ENTRIES.invalid IS NULL) AND 
  ( ( (ENTRIES.uid='user1' OR ENTRIES.uid IS NULL) ) OR 
    ( (ENTRIES.uid='user2' OR ENTRIES.uid IS NULL) ) ) 
LIMIT 100000

Shouldn't the fileclass condition (group == "foo") have been inlined and be present in the SQL WHERE condition as well, in the ENTRIES.uid='user1' part?

Right now, it looks like the fileclass definition is entirely skipped when report = no :thinking: