awslabs / damo

DAMON user-space tool
https://damonitor.github.io/
GNU General Public License v2.0
148 stars 28 forks source link

tests/start_stop: Set smaller aggr interval for tune-record-ongoing-validate #93

Closed velurimithun closed 3 months ago

velurimithun commented 3 months ago

Description of changes: Default limit for nr_accesses is [0,24] so max aggr interval can be 100ms for 5ms sample interval.

We are reducing the aggregation interval from 200ms to 80ms when tuning the damo. This helps us to avoid start_stop test failure at L81 in validate.py.

Test results:

Env Amazon Linux 2023 - ami-05295b6e6c790593e (6.1.79-99.167.amzn2023.x86_64), ap-south-1.

Before the patch

$ sudo ./tests/run.sh 
PASS unit test_damo_scheme_dbgfs_conversion.py
PASS unit test_damo_schemes_input.py
PASS unit test_damo_show.py
<snipped>
<snipped>
PASS damon_lru_sort
PASS start_stop sysfs start
PASS start_stop sysfs record-ongoing-validate
PASS start_stop sysfs status 10
invalid: expect 0<=nr_accesses<=24 but 38
FAIL start_stop sysfs (invalid record file after tune)
$ 

After the patch

$ sudo ./tests/run.sh 
PASS unit test_damo_scheme_dbgfs_conversion.py
PASS unit test_damo_schemes_input.py
PASS unit test_damo_show.py
<snipped>
<snipped>
PASS schemes
PASS damon_reclaim
PASS damon_lru_sort
PASS start_stop sysfs start
PASS start_stop sysfs record-ongoing-validate
PASS start_stop sysfs status 10
PASS start_stop sysfs tune-record-ongoing-validate
PASS start_stop sysfs tune-status 10
PASS start_stop sysfs stop
PASS start_stop
PASS ALL
$

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sj-aws commented 3 months ago

Hi Mithun,

Thank you so much for this great pull request.

invalid: expect 0<=nr_accesses<=24 but 38

So, the test code is updating the aggregation interval to 200ms. Since sampling interval is still the default value (5ms), the possible maximum nr_accesses became 200ms / 5ms = 40. But the followup damo validate command of the test code is not applying the fact, so failing.

This PR fixes the issue by changing the new aggregation interval from 200ms to 80ms. Nice fix. But I'm bit concerned about the fact that this changes the testing coverage. I think we could have another approach.

damo validate assumes DAMON runs with the default sampling/aggregation intervals (5ms/100ms), and hence possible maximum nr_accesses is 100ms / 5ms = 20. Since users could run DAMON with non-default sampling/aggregation intervals, damo validate allows users set the expectd maximum nr_accesses using --nr_accesses option.

What about fixing this issue by keeping the new aggregation interval as 200ms but updating the expected nr_accesses by changing the followup damo validate call to use --nr_accesses option? E.g.,

--- a/tests/start_stop/test.sh
+++ b/tests/start_stop/test.sh
@@ -97,7 +97,8 @@ do
        fi
        sudo timeout 3 "$damo" record ongoing \
                --damon_interface_DEPRECATED "$damon_interface" &> /dev/null
-       if ! "$damo" validate --aggr 180000 220000 2> /dev/null
+       if ! "$damo" validate --aggr 180000 220000 --nr_accesses 0 44 \
+               2> /dev/null
        then
                echo "FAIL $testname2 (invalid record file after tune)"
                if ! sudo "$damo" stop
velurimithun commented 3 months ago

Thank you for review @sj-aws

I didn't notice that we have an option to provide limits for nr_accesses in validate subcmd. Sry about that. Yes, this is much simpler approach to tackle the problem without compromising on testing coverage.

Now, I've updated the commit.

sj-aws commented 3 months ago

Looks great to me, merged :)