Closed honggyukim closed 3 months ago
I see that the reason was that I increased the number of regions from 10 to 100 or 1000.
However, rolling back to 10 regions also takes more than 5 seconds. I still feel that it'd be helpful if it can be faster.
In addition, I see that many "tried regions" although they are going to be filtered out by cgroup filter.
Can we also hide such filtered out regions from "tried regions" in damo show output?
Hi Honggyu, thank you for this report.
damo show
uses DAMON sysfs interface's DAMOS tried regions feature[1]. In detail, damo show
ensure there are at least one monitoring DAMON scheme (DAMON scheme that having stat
as the action
, and [min, max]
as all access pattern ranges) per context (if there is a context not having it, install one monitoring scheme), and asks DAMON to expose the detailed information of the monitored regions via DAMOS tried regions feature. DAMOS tried regions feature exposes the information by creating one directory and four files that having the information, per the tried region. Hence, the kernel part operation is assumed to impose high overhead and take long time if the number of tried regions is large, since it has to create large number of directories and files.
Users can control the overhead using access pattern based damo show
results filtering options including --access_rate
, --age
, and --sz_region
. damo
passes the information to DAMON so that DAMON doesn't spend unnecessary time at creating the files that the user doesn't have interest. Therefore, good use of the options will allow you minimize the overhead and get the damo show
output faster. For example, you could ask damo show
to show only hot regions, or regions of specific range of hotness.
Also, note that --total_sz_only
option of damo show
avoids DAMON creating all the directories and files, from kernel v6.6-rc1. If you have interest in only total size of regions of specific access pattern (e.g., total size of regions that not accessed for more than 5 minutes), you could use that to get the information faster.
We having a few kernel level optimization ideas for better use of the feature, though. Two of those are for allowing users navigate DAMON monitoring results like they do with some online map service like Google Maps.
The first idea is to let users know how many tried regions exist at the moment, so that users can avoid using the feature when the number is too large, or modify the show
target access pattern so that only small number of regions will be captured.
The second idea is to let users set the resolution of the information. That is, users will be able to set the total number of regions that will be exposed their information via the feature. Then, if the user-defined number is smaller than the number of real tried regions, DAMON will collapse some of the regions for the report. As a result, the quality of the information will be degraded, but the tried regions directories/files creation will also be decreased.
Using the two features, damo
users will be able to control the resolution and specific area to show the monitoring results, like we show low-resolution overall picture from map, and then zoom in/out to proper region of interest with Google map-like products. This may take some time, though.
The tried regions feature is not the only wat to get the monitoring results. DAMON also provides tracepoints, which doesn't require creation of the files. Maybe we can think about making a new option of damo show
that allows users to ask damo show
to use the DAMON tracepoints instead of the tried regions feature. It could be slower than damo show
under small number of regions since it would need to enable/capture/disable tracepoints, though, especially since current DAMO implementation is using perf
inside. I expect it would take about 3-5 seconds in general, but wouldn't increase too much unlike damo show
under larget number of regions.
Can we also hide such filtered out regions from "tried regions" in damo show output?
cgroup and backing-content type based DAMOS filters work in page granularity, while DAMON regions are defined as address range. Since such hiding would impose significant overhead. The address range and monitoring target based DAMOS filters patchset[1] may give you the detail.
So, making such feature would be possible, but I cannot get an idea for efficient implementation of it at the moment. So I'd like to recommend you looking for other options.
[1] https://lore.kernel.org/damon/20230802214312.110532-1-sj@kernel.org/
Hi SeongJae,
I'm sorry for the late response again. I had read your detailed explanation, but took a bit of time to digest all the comments.
- Expected high overhead mechanism
I now understand that the high overhead is expected to bring up the information through tried regions via sysfs.
- User level overhead control
Yeah, that would be another good option.
- DAMON level optimization ideas
The first idea is to let users know how many tried regions exist at the moment, so that users can avoid using the feature when the number is too large, or modify the show target access pattern so that only small number of regions will be captured.
That would be a good idea. I sometimes wanted to know only the number of tried_regions for the given DAMOS action.
The second idea is to let users set the resolution of the information. That is, users will be able to set the total number of regions that will be exposed their information via the feature. Then, if the user-defined number is smaller than the number of real tried regions, DAMON will collapse some of the regions for the report. As a result, the quality of the information will be degraded, but the tried regions directories/files creation will also be decreased.
That would also be good, but I don't have a clear idea how to properly collapse the information.
This may take some time, though.
Sure. I don't expected it being supported in the near future.
- DAMO level faster solution
Having tracepoints will also be a good option.
cgroup and backing-content type based DAMOS filters work in page granularity, while DAMON regions are defined as address range. Since such hiding would impose significant overhead.
Thanks. I get that there is no way to filter out before scanning each pages inside regions.
The address range and monitoring target based DAMOS filters patchset[1] may give you the detail.
I remember this was implemented based on my request at https://github.com/awslabs/damo/issues/65#issuecomment-1656379106.
Thanks very much for your help and explanation as always.
Hi Honggyu,
Thank you very much for your valuable feedback as always.
That would be a good idea. I sometimes wanted to know only the number of tried_regions for the given DAMOS action. [...] That would also be good, but I don't have a clear idea how to properly collapse the information.
I'll prioritize the number of tried regions and resolution-based collapsing implementations.
The resolution-based collapsing would be somewhat similar to damo report heats
. We split the region by the user-specified resolution, and merge regions in each cell.
Having tracepoints will also be a good option.
The DAMOS tried regions tracepoint is also now implemented, and the patches are merged in mm tree. The support from damo record
is also implemented (https://github.com/awslabs/damo/commit/d5814668). There is no automated test for that yet, though.
I remember this was implemented based on my request at https://github.com/awslabs/damo/issues/65#issuecomment-1656379106.
You're correct. It's in the mm tree. Hopefully, that will be merged in Linux v6.7.
Hi SeongJae,
I replied after 3 weeks of your answer, but you replied right after my comment. :)
I'll prioritize the number of tried regions and resolution-based collapsing implementations.
I would like to say that you don't have to take this request too seriously. It's just my wish list but not a seriously important request to be honest.
I rather need to have more serious and important feature in DAMON, but I need to talk to my colleagues first.
Besides that, this damo project is getting more and more important to our project so I feel grateful for your persistent work and support for this useful project.
No problem at all. Please feel free to ask for new features and prioritize your requests as needed. We want this tool to be somewhat useful for real users like you :)
Couldn't be happier than hearing that you think it is somewhat useful.
We want this tool to be somewhat useful for real users like you :)
Thanks. Happy to hear that! :)
I noticed that I mixed the usage of damo show
and damo status
. I feel like it'd be useful to see the current damon setting without updating tried_regions.
We may be able to provide a simple and quick usage of damo status
without writing commit
or update_schemes_tried_regions
to status
, then it'd be really quick. It could be provided as an additional option.
Nice idea, agreed to all the points. I will work on it.
Hi Honggyu,
I think your interest is only DAMOS statistics, correct? I implemented --damos_stats
option of damo status
[1]. It updates only scheme statistics and show it. Would it cover this needs?
[1] https://github.com/awslabs/damo/commit/4916f6433313bb7fb45f658611dcfbb36fb8ee29
Hi SeongJae,
Thanks for the update.
I think your interest is only DAMOS statistics, correct? I implemented --damos_stats option of damo status
I actually wanted to have the status output something like this without statistics
and tried regions
.
$ ./damo status
kdamond 0
state: on, pid: 969
context 0
ops: paddr
target 0
pid: 0
region [4,294,967,296, 18,253,611,007) (13.000 GiB)
intervals: sample 100 ms, aggr 2 s, update 20 s
nr_regions: [100, 10,000]
scheme 0
action: pageout per aggr interval
target access pattern
sz: [4.000 KiB, max]
nr_accesses: [0 samples, 0 samples]
age: [5 aggr_intervals, 9,223,372,036,854 aggr_intervals]
quotas
0 ns / 0 ns per max
priority: sz 0 %, nr_accesses 0 %, age 0 %
watermarks
metric none, interval 0 ns
0 %, 0 %, 0 %
But if it doesn't take much time to get statistics
then I'm fine to show it together. I also think that the main time consuming part is to get tried regions
info.
Besides that, I also like the output of --damos_stat
and that would be useful when monitoring the status. So please keep the option. Thanks.
$ ./damo status --damos_stats
nr_tried: 5,258
sz_tried: 650.000 GiB
nr_applied: 2
sz_applied: 68.000 KiB
qt_exceeds: 0
By the way, I actually got an error when running damo status
as follows.
$ ./damo status
Traceback (most recent call last):
File "/home/root/damo/./damo", line 116, in <module>
main()
File "/home/root/damo/./damo", line 113, in main
subcmd.execute(args)
File "/home/root/damo/_damo_subcmds.py", line 31, in execute
self.module.main(args)
File "/home/root/damo/damo_status.py", line 137, in main
update_tried_regions=(args.damos_stat == None))
AttributeError: 'Namespace' object has no attribute 'damos_stat'. Did you mean: 'damos_stats'?
I got the the previous sane result after reverting the following bad commit.
a905107f6eac6af33badf5f53161b68570621d6e is the first bad commit
commit a905107f6eac6af33badf5f53161b68570621d6e
Author: SeongJae Park <sj38.park@gmail.com>
Date: Sat Nov 4 21:49:25 2023 +0000
damo_status: Remove --damos_stat and --damos_stat_field options
Remove the options in favor of --damos_stats and --damos_stat_fields.
Hopefully there is no user of the options, so no grace period is needed.
Will restore if someone complains.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
damo_status.py | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
I actually wanted to have the status output something like this without statistics and tried regions.
Oh, ok. Making yet another option for the purpose is also no problem. I'll implement it soon.
I actually got an error when running damo status as follows.
Ah, nice catch. Occasionally I also just found it and fixed[1].
[1] https://github.com/awslabs/damo/commit/f4b382e8d6815760c90caa541addb38feba3eab5
Hello Honggyu,
In short, your slow damo show
might not due to the DAMOS applied regions creation overhead, but due to a long aggregation interval of your setup.
The detail is like this. damo show
asks DAMON sysfs interface to update the DAMOS applied regions directory. Because DAMON sysfs interface is not saving the DAMOS applied regions information always, it should wait until DAMON snapshot is ready and therefore DAMOS apply actions to the regions. By default, the DAMON snapshot is ready for every aggregation interval, due to the sampling-based monitoring mechanism. Hence, if you set aggregation interval long, DAMON sysfs interface should wait long before starting the applied regions directory creation. Your time
output, which shows nearly zero system time at the first comment of this thread, also fits with the theory.
To overcome similar issues, we implemented DAMOS apply interval[1] and its pre-requisite patchsets. DAMON sysfs interface is still waiting for one aggregation interval in the case, though. Updating it to finish as soon as applied regions directory creation is in our TODO list. I think we don't need prioritization of it at the moment though, since your issue would fixed with the above changes. If not, please let us know.
[1] https://lore.kernel.org/damon/20230916020945.47296-1-sj@kernel.org/
Hi SeongJae,
Thanks for your help!
your slow damo show might not due to the DAMOS applied regions creation overhead, but due to a long aggregation interval of your setup.
You're right. I'm able to see the difference for different interval setups. But it looks like it's not related to DAMOS and I can see the difference for default stat
action.
Here are the examples. I just ran damo start
for the default setup, which is 5ms
sampling interval and 100ms
aggregation interval as follows.
$ ./damo start
$ time ./damo status
kdamond 0
state: on, pid: 1048
context 0
ops: paddr
target 0
pid: 0
region [4,294,967,296, 9,663,676,415) (5.000 GiB)
intervals: sample 5 ms, aggr 100 ms, update 1 s
nr_regions: [10, 1,000]
real 0m0.423s
user 0m0.095s
sys 0m0.018s
In this case, the damo status
ran really fast.
However, if I increase the interval, then I can see the damo status
takes much more time.
# Stop the previous damo start
$ ./damo stop
# Start damo with 20 times longer intervals
$ ./damo start --monitoring_intervals 100ms 2s 20s
$ time ./damo status
kdamond 0
state: on, pid: 1086
context 0
ops: paddr
target 0
pid: 0
region [4,294,967,296, 9,663,676,415) (5.000 GiB)
intervals: sample 100 ms, aggr 2 s, update 20 s
nr_regions: [10, 1,000]
real 0m4.373s
user 0m0.088s
sys 0m0.022s
This time the damo status
takes more than 4 seconds.
If this is also related to DAMOS because stat
is also one of the DAMOS action, then I think the waiting time can be shorter when running damo status
without both updating tried regions and DAMOS stat info as we already discussed as follows.
I actually wanted to have the status output something like this without statistics and tried regions.
Oh, ok. Making yet another option for the purpose is also no problem. I'll implement it soon.
Simply having --damos_stats
also makes the execution time faster. The following shows the difference clearly.
$ ./damo stop
$ ./damo start --monitoring_intervals 100ms 2s 20s
$ time ./damo status --damos_stats
real 0m0.518s
user 0m0.088s
sys 0m0.015s
$ time ./damo status
kdamond 0
state: on, pid: 1196
context 0
ops: paddr
target 0
pid: 0
region [4,294,967,296, 9,663,676,415) (5.000 GiB)
intervals: sample 100 ms, aggr 2 s, update 20 s
nr_regions: [10, 1,000]
real 0m5.920s
user 0m0.083s
sys 0m0.016s
Thanks very much for your help.
I can clearly see the bottleneck is when writing to sysfs inside _damon_sysfs.update_schemes_tried_regions
.
This is the result using uftrace tool, which I mentioned previously and the trace was recorded simply as follows.
$ uftrace record ./damo status
The following is the output that I got by running uftrace tui
.
uftrace replay output can be shown as follows.
$ uftrace replay -t 10ms
# DURATION TID FUNCTION
[ 1247] | __main__.<module>() {
18.424 ms [ 1247] | importlib._bootstrap._find_and_load();
[ 1247] | importlib._bootstrap._find_and_load() {
[ 1247] | damo_adjust.<module>() {
[ 1247] | _damon_result.<module>() {
26.960 ms [ 1247] | _damo_fmt_str.<module>();
11.091 ms [ 1247] | _damon.<module>();
69.217 ms [ 1247] | } /* _damon_result.<module> */
69.873 ms [ 1247] | } /* damo_adjust.<module> */
70.358 ms [ 1247] | } /* importlib._bootstrap._find_and_load */
[ 1247] | importlib._bootstrap._find_and_load() {
12.133 ms [ 1247] | damo_report.<module>();
12.552 ms [ 1247] | } /* importlib._bootstrap._find_and_load */
[ 1247] | main() {
[ 1247] | _damo_subcmds.add_parser() {
12.240 ms [ 1247] | damo_report.set_argparser();
13.995 ms [ 1247] | } /* _damo_subcmds.add_parser */
[ 1247] | _damo_subcmds.execute() {
[ 1247] | damo_status.main() {
[ 1247] | _damon.update_read_kdamonds() {
[ 1247] | _damon.update_schemes_status() {
[ 1247] | _damon.update_schemes_stats() {
[ 1247] | _damon_sysfs.update_schemes_stats() {
[ 1247] | _damo_fs.write_file() {
416.006 ms [ 1247] | TextIOWrapper.__exit__();
416.061 ms [ 1247] | } /* _damo_fs.write_file */
416.148 ms [ 1247] | } /* _damon_sysfs.update_schemes_stats */
416.155 ms [ 1247] | } /* _damon.update_schemes_stats */
[ 1247] | _damon.update_schemes_tried_regions() {
[ 1247] | _damon_sysfs.update_schemes_tried_regions() {
[ 1247] | _damo_fs.write_file() {
4.159 s [ 1247] | TextIOWrapper.__exit__();
4.159 s [ 1247] | } /* _damo_fs.write_file */
4.159 s [ 1247] | } /* _damon_sysfs.update_schemes_tried_regions */
4.159 s [ 1247] | } /* _damon.update_schemes_tried_regions */
4.576 s [ 1247] | } /* _damon.update_schemes_status */
4.581 s [ 1247] | } /* _damon.update_read_kdamonds */
4.583 s [ 1247] | } /* damo_status.main */
4.583 s [ 1247] | } /* _damo_subcmds.execute */
4.668 s [ 1247] | } /* main */
4.780 s [ 1247] | } /* __main__.<module> */
This is the trace result with time filter, which discards small functions that takes under 10ms.
Thank you for the update and the awesome uftrace output. Yes, even without a scheme, the writing would take time. Maybe we could optimize DAMON sysfs interface for the corner case. I'll take a look soon.
Hi Honggyu, just implemented an option[1] for this case. It shows the detailed kdamond status without scheme stats and tried regions. For example:
$ sudo ./damo start --damos_action stat
$ sudo ./damo status --damon_params
kdamond 0
state: on, pid: 45564
context 0
ops: paddr
target 0
pid: 0
region [4,294,967,296, 136,292,859,903) (122.933 GiB)
intervals: sample 5 ms, aggr 100 ms, update 1 s
nr_regions: [10, 1,000]
scheme 0
action: stat per aggr interval
target access pattern
sz: [0 B, max]
nr_accesses: [0 samples, 3,689,348,814,741,910,528 samples]
age: [0 aggr_intervals, 184,467,440,737,095 aggr_intervals]
quotas
0 ns / 0 ns per max
priority: sz 0 %, nr_accesses 0 %, age 0 %
watermarks
metric none, interval 0 ns
0 %, 0 %, 0 %
[1] https://github.com/awslabs/damo/commit/4521f95ee90c04d24eed7702c6033c29d1077970
Hi Honggyu, just implemented an option[1] for this case. It shows the detailed kdamond status without scheme stats and tried regions.
Hi SeongJae, I've found that the above new option --damon_params
makes the damo status
much faster. Thanks!
Hi Honggyu, unfortunately I lost some of the context of this issue. Are the issues all resolved? Or, are you waiting any answers or implementations from my side?
Sorry for the late response. The damo status --damon_params
is much faster so we can close it.
I sometimes want to monitor the output of
damo show
, but I feel it is quite slow.From my experience, it takes around 10 seconds, but I'm just wondering if it's possible to make it faster.