Support yaml format config input

honggyukim commented 3 months ago

The current damo accepts json format for configuration, but json has some limitation as follows.

cannot add comment
verbose due to inherent open-close pair of braces

This patch introduces configuration with yaml format and its benefit is as follows.

able to add comment, so users can add description or comment out some config logics
shorter than json format because yaml doesn't have closing mark

The simple usage is as follows.

$ ./damo.py fmt_yaml > config.yaml

$ cat config.yaml
kdamonds:
- state: null
  pid: null
  contexts:
  - ops: paddr
    targets:
    - pid: null
      regions:
      - start: 4,294,967,296
        end: 19,025,362,943
    intervals:
      sample_us: 5 ms
      aggr_us: 100 ms
      ops_update_us: 1 s
    nr_regions:
      min: '10'
      max: 1,000
    schemes: []

$ ./damo.py start config.yaml

honggyukim commented 3 months ago

Hi SeongJae, this is just to give you a rough idea so didn't complete the patch yet. I will update it if you don't have objection on this.

honggyukim commented 3 months ago

Example usage is as follows. Please note that the config.yaml has some useful comments.

$ cat config.yaml 
kdamonds:
- state: null
  pid: null
  contexts:
  - ops: paddr
    targets:
    - pid: null
      regions:
      - start: 4,294,967,296  # '0x100000000'
        end: 19,025,362,943   # '0x46dffffff'
    intervals:
      sample_us: 100 ms     # 20 times of default sampling interval
      aggr_us: 2 s          # 20 times of default aggregation interval
      ops_update_us: 20 s   # This can be removed as paddr doesn't use it,
                            # but cannot be removed for now due to it has
                            # to be higher than aggr_us anyhow.
    nr_regions:
      min: '10'
      max: 1,000
    schemes: []

This config.yaml can simply be passed to damo start.

$ sudo ./damo.py start config.yaml

The damo status shows it is correctly configured as follows.

$ sudo ./damo.py status
kdamond 0
    state: on, pid: 25786
    context 0
        ops: paddr
        target 0
            pid: 0
            region [4,294,967,296, 19,025,362,943) (13.719 GiB)
        intervals: sample 100 ms, aggr 2 s, update 20 s
        nr_regions: [10, 1,000]

honggyukim commented 3 months ago

The same config file in json format is longer than yaml format as follows.

{
    "kdamonds": [
        {
            "state": null,
            "pid": null,
            "contexts": [
                {
                    "ops": "paddr",
                    "targets": [
                        {
                            "pid": null,
                            "regions": [
                                {
                                    "start": "4,294,967,296",
                                    "end": "19,025,362,943"
                                }
                            ]
                        }
                    ],
                    "intervals": {
                        "sample_us": "100 ms",
                        "aggr_us": "2 s",
                        "ops_update_us": "20 s"
                    },
                    "nr_regions": {
                        "min": "10",
                        "max": "1,000"
                    },
                    "schemes": []
                }
            ]
        }
    ]
}

sjp38 commented 3 months ago

Thank you for this PR, Honggyu!

To my perspective, this PR is making two changes:

Supporting yaml format DAMON parameter inputs, and
Supporting command line options based DAMON parameters to yaml format DAMON parameters input conversion.

I think the first change will be very useful, as you nicely explained, and would like to merge it as soon as you drop RFC tag.

However, I'm not pretty sure if the second change is really needed, since same thing can be achieved by converting fmt_json output to a yaml file. Also, we might want to support yet another format of DAMON parameters in future. In long term, I think having a new command for general converting of DAMON parameters in multiple formats, and eventually deprecating fmt_json might be a better way. The command may supporting conversion of DAMON parameters in command line options, json, or yaml formats to different formats. E.g.,

$ damo convert_damon_parameters --damos_action stat --convert_output_format json
[...]
$ damo convert_damon_parameters ./damon_parameters.json --convert_output_format yaml
[...]

And I don't mind merging the first change now, and implementing the long term solution later.

May I ask your opinion, Honggyu?

honggyukim commented 3 months ago

Hi SeongJae,

Thanks for the useful feedback.

To my perspective, this PR is making two changes:

Supporting yaml format DAMON parameter inputs, and

Do you mean supporting yaml format file input such as running damo start config.yaml?

Supporting command line options based DAMON parameters to yaml format DAMON parameters input conversion.

I think the first change will be very useful, as you nicely explained, and would like to merge it as soon as you drop RFC tag.

However, I'm not pretty sure if the second change is really needed, since same thing can be achieved by converting fmt_json output to a yaml file.

Do you mean you're not sure about adding fmt_yaml command?

Also, we might want to support yet another format of DAMON parameters in future. In long term, I think having a new command for general converting of DAMON parameters in multiple formats, and eventually deprecating fmt_json might be a better way.

Yeah, I actually thought that the name fmt_json isn't very good. I think having damo config command might be better and providing --format=[json|yaml] option can be useful.

The command may supporting conversion of DAMON parameters in command line options, json, or yaml formats to different formats. E.g.,
$ damo convert_damon_parameters --damos_action stat --convert_output_format json
[...]
$ damo convert_damon_parameters ./damon_parameters.json --convert_output_format yaml
[...]

I feel the command name convert_damon_parameters looks too long and verbose and it even requires another long option --convert_output_format, which isn't good for general users as they can make typo errors very easily.

And I don't mind merging the first change now, and implementing the long term solution later.

That might be also useful as a first step for yaml support.

Thanks!

sjp38 commented 3 months ago

Do you mean supporting yaml format file input such as running damo start config.yaml? [...] Do you mean you're not sure about adding fmt_yaml command?

Right.

I think having damo config command might be better and providing --format=[json|yaml] option can be useful.

I concern if config might be too short and unclear. I think at least gen_config or mkconfig might be better name, but still I'm unsure if it is clearly explaining what it does (e.g., what config?).

Yeah, I actually thought that the name fmt_json isn't very good.

Cannot agree more. I indeed always wanted to change the name.

[...] I feel the command name convert_damon_parameters looks too long and verbose and it even requires another long option --convert_output_format, which isn't good for general users as they can make typo errors very easily.

Again, agree. The names are only for example at the moment. One another idea off the top of my head is format damon_params --format {json,yaml,cmdline}, but I'd like to take more time on this. Naming is always the most challenging and time-consuming real works of programmers :wink: But I'm wondering what you think about the idea of the functionaility.

That might be also useful as a first step for yaml support.

Glad to get your confirm :)

honggyukim commented 3 months ago

Hi SeongJae,

I think having damo config command might be better and providing --format=[json|yaml] option can be useful.

I concern if config might be too short and unclear. I think at least gen_config or mkconfig might be better name, but still I'm unsure if it is clearly explaining what it does (e.g., what config?).

Yeah, damo config might be unclear, but just wanted to make it short. I feel gen_config and mkconfig both sound good, but I chose to use gen_config.py when I created a user tool in hmsdk.

Yeah, I actually thought that the name fmt_json isn't very good.

Cannot agree more. I indeed always wanted to change the name.

[...] I feel the command name convert_damon_parameters looks too long and verbose and it even requires another long option --convert_output_format, which isn't good for general users as they can make typo errors very easily.

Again, agree. The names are only for example at the moment. One another idea off the top of my head is format damon_params --format {json,yaml,cmdline}, but I'd like to take more time on this.

Yeah, let's take some time and think about it more.

Naming is always the most challenging and time-consuming real works of programmers 😉 But I'm wondering what you think about the idea of the functionaility.

Let me think more about the user interface.

And I don't mind merging the first change now, and implementing the long term solution later.

That might be also useful as a first step for yaml support.

Glad to get your confirm :)

I've just updated the patch removing fmt_yaml support as you asked. Please read the commit message. Thanks.

honggyukim commented 3 months ago

I've removed fmt_yaml support, but I would like to keep the code here for later reference. Please ignore it for now.

$ cat src/damo_fmt_yaml.py
# SPDX-License-Identifier: GPL-2.0

"""
Convert args to DAMON yaml input.
"""

from collections import OrderedDict

import yaml

import _damon
import _damon_args

class OrderedDumper(yaml.SafeDumper):
    pass

def dict_representer(dumper, data):
    return dumper.represent_dict(data.items())

OrderedDumper.add_representer(OrderedDict, dict_representer)

def yaml_dump(data):
    return yaml.dump(data, Dumper=OrderedDumper, default_flow_style=False, sort_keys=False)

def main(args):
    _damon.ensure_root_permission()

    kdamonds, err = _damon_args.kdamonds_for(args)
    if err:
        print('invalid arguments (%s)' % err)
        exit(1)

    for k in kdamonds:
        for c in k.contexts:
            for s in c.schemes:
                s.stats = None
                s.tried_regions = None

    if args.schemes_only:
        schemes = []
        for kdamond in kdamonds:
            for ctx in kdamond.contexts:
                schemes += ctx.schemes
        print(yaml_dump([s.to_kvpairs(args.raw) for s in schemes]))
        return
    print(yaml_dump({'kdamonds': [k.to_kvpairs(args.raw) for k in kdamonds]}))

def set_argparser(parser):
    _damon_args.set_argparser(parser, add_record_options=False)
    parser.add_argument('--schemes_only', action='store_true',
            help='print schemes part only')
    parser.add_argument('--raw', action='store_true',
            help='print numbers in machine friendly raw form')

honggyukim commented 3 months ago

I'm also thinking about adding a convenience script json_to_yaml.py somewhere as follows.

$ cat json_to_yaml.py
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0

import sys
import json
import yaml

def json_to_yaml(json_file_path):
    with open(json_file_path, 'r') as json_file:
        json_data = json.load(json_file)
    return yaml.dump(json_data, default_flow_style=False,
                     sort_keys=False, allow_unicode=True)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: ./json_to_yaml.py <json_file_path>")
        sys.exit(1)

    json_file_path = sys.argv[1]
    yaml_str = json_to_yaml(json_file_path)
    print(yaml_str)

sjp38 commented 3 months ago

The updated commit looks good to me. Thank you Honggyu!

sjp38 commented 3 months ago

The current damo accepts json format for configuration, but json has some limitation as follows.

cannot add comment

verbose due to inherent open-close pair of braces

I already merged this PR, but let me clarify my thoughts about the above argument.

Commenting on json can be done with some tricks.

json doesn't support commenting, but you could have similar ones by adding fake key/value pairs in maps. For example,

{
    "kdamonds": [
        {
            "comment": ["this kdamond is for ..."],
            "state": null,
            "pid": null,
            "contexts": [
                {
                    "comment": ["this context monitors physical address space"],
                    "ops": "paddr",
[...]

The trick doesn't work for lists, and the strict syntax of json makes it further tedious, though.

json is not required to be verbose.

To my understanding, the new lines are not really required, so you can collapse the things. E.g.,

{
    "kdamonds": [{
            "state": null, "pid": null,
            "contexts": [{"ops": "paddr",
                    "targets": [{"pid": null,
                            "regions": [{"start": "4,294,967,296", "end": "19,025,362,943"}]}],
[...]

Of course, that requires manual works.

I believe supporting yaml is nice.

As also mentioned above, those are tricks that having their own limitation and require additional works. So I believe supporting yaml is a nice idea.

honggyukim commented 3 months ago

Hi SeongJae, thanks for merging this patch.

I already merged this PR, but let me clarify my thoughts about the above argument.

Commenting on json can be done with some tricks.

json doesn't support commenting, but you could have similar ones by adding fake key/value pairs in maps. For example,

Interesting trick. But I think it requires many users to be clever enough for this.

To my understanding, the new lines are not really required, so you can collapse the things. E.g.,

Yeah, that's true, but I wish we could have a way to make the json in that format automatically. I guess it's not widely used format.

I believe supporting yaml is nice.

As also mentioned above, those are tricks that having their own limitation and require additional works. So I believe supporting yaml is a nice idea.

Thanks. We can think more about the user interface for generating yaml config just like damo fmt_json does.

sj-aws commented 2 months ago

FYI, I'm planning to integrate every damo command for creating some sort of outputs including fmt_json into one (probably show). That may help supporting yaml-format DAMON parameter formatting (https://github.com/awslabs/damo/issues/105#issuecomment-2226357097)

honggyukim commented 2 months ago

Thanks. That would be useful!

sjp38 commented 2 months ago

FYI, the yaml format support is added to a new command, args damon (https://github.com/awslabs/damo/issues/105#issuecomment-2241282173).

honggyukim commented 2 months ago

Thanks for the update. But it looks the output is incorrect.

$ sudo ./damo args damon --format yaml
kdamonds:
- !!python/object/apply:collections.OrderedDict
  - - - state
      - null
    - - pid
      - null
    - - contexts
      - - !!python/object/apply:collections.OrderedDict
          - - - ops
              - paddr
            - - targets
              - - !!python/object/apply:collections.OrderedDict
                  - - - pid
                      - null
                    - - regions
                      - - !!python/object/apply:collections.OrderedDict
                          - - - start
                              - 4,294,967,296
                            - - end
                              - 825,707,462,655
            - - intervals
              - !!python/object/apply:collections.OrderedDict
                - - - sample_us
                    - 5 ms
                  - - aggr_us
                    - 100 ms
                  - - ops_update_us
                    - 1 s
            - - nr_regions
              - !!python/object/apply:collections.OrderedDict
                - - - min
                    - '10'
                  - - max
                    - 1,000
            - - schemes
              - []

I added OrderedDumper to hide OrderedDict in the yaml output before. Please see https://github.com/awslabs/damo/pull/97#issuecomment-2156630128.

sjp38 commented 2 months ago

Thank you for the kind comment, Honggyu. I think that part would better to be removed for simpler output. But I'm not sure if it is incorrect. Can the OrderedDict part can make parsing of the output fail? If not, I think we can call it "too verbose" or "ugly", but "incorrect" is not technically correct?

Anyway I'll try to learn from the code you kindly shared. If you don't mind, please feel free to upload a PR instead of waiting for me, though.

honggyukim commented 2 months ago

The generated yaml file fails to be used for damo start as follows.

$ sudo ./damo args damon --format yaml > test.yaml

$ sudo ./damo start test.yaml
could not turn on damon (cannot create kdamonds from args (target 'test.yaml' is not supported))

Unlike yaml format, json format works fine as expected.

$ sudo ./damo args damon --format json > test.json

$ sudo ./damo start test.json

$ sudo ./damo status
kdamond 0
    state: on, pid: 308881
    context 0
        ops: paddr
        target 0
            pid: 0
            region [4,294,967,296, 551,903,297,535) (510.000 GiB)
        intervals: sample 5 ms, aggr 100 ms, update 1 s
        nr_regions: [10, 1,000]

sjp38 commented 2 months ago

Interesting. Thank you for reporting this. I will take a look later. For tracking, opened an issue: https://github.com/awslabs/damo/issues/107

honggyukim commented 2 months ago

Thanks for creating the issue!

sjp38 commented 2 months ago

The fix for the issue is just pushed. Just for a clarification.

Can the OrderedDict part can make parsing of the output fail? If not, I think we can call it "too verbose" or "ugly", but "incorrect" is not technically correct?

It makes parsing fail if the parsing part uses yaml.safe_load(). It can be parsed with custom loaders, though. So technically speaking, it is not "incorrect". Please refer to the fix commit for more details.

honggyukim commented 2 months ago

Thanks for the explanation. I have tried a bit with custom loader for yaml.load() for the previously generated yaml output, but couldn't successfully load it after some trials.

Anyway, that might be because of my lack of knowledge of custom loader so I won't say that was incorrect and I will say it was too verbose. Thanks for making yaml support complete on generation side.

awslabs / damo

Support yaml format config input #97