elastic / rally

Macrobenchmarking framework for Elasticsearch
Apache License 2.0
35 stars 313 forks source link

DRY/Refactor hard coded config file keys #1646

Open dliappis opened 1 year ago

dliappis commented 1 year ago

The motivation of this issue is to prevent bugs like #1645.

The Config object uses various keys for storing configuration properties. Currently many of those get populated during argument parsing (e.g. https://github.com/elastic/rally/blob/4c7141aac0179c38c08c134c25f7a23e6f07d2d9/esrally/rally.py#L1024-L1029), and the same keys are referenced, again as strings, in other code locations e.g. in metrics.py.

This approach is brittle and error prone. We should DRY things up and instead reference the config options via meaningful variable names in some module (e.g. config.py) such that a typo becomes a real error.

MadhuMPandurangi commented 1 year ago

Hi @dliappis, I'm interested in working on this issue, can you please assign me this. Thank you

dliappis commented 1 year ago

@MadhuMPandurangi thank you for your interest. I've assigned it to you.

sakurai-youhei commented 1 year ago

@dliappis I'm also interested. How about this way? I feel like I may reinvent the wheel somewhere else, though.

# erally/const.py
from collections import UserDict

class Node(UserDict):
    def __init__(self):
        super().__init__(self._iterattrs())

    def _iterattrs(self):
        attributes = vars(type(self))
        for k in attributes:
            if k.startswith('_'):
                continue
            yield k, attributes[k]

class Section(Node):
    def _iterattrs(self):
        for _, v in super()._iterattrs():
            yield v[1], v

class SectionA(Section):
    _section = "section_a"
    snake_case_key = (_section, "snake_case_key")
    dot_notation_key = (_section, "dot.notation.key")

class Root(Node):
    section_a: SectionA = SectionA()

CONFIG_PATH = Root()

With the above, (section, key) will be accessible like this:

from esrally.const import CONFIG_PATH as CFG

CFG.section_a.snake_case_key
CFG.section_a.dot_notation_key
CFG.section_a["dot.notation.key"]

So, the example code will be refactored in the following with some bonuses, such as nice code completion, pprint capability thanks to an inheritance from dict, the same line length between before and after the change, etc. In addition to those, if you prefer, we can capitalize sections and keys (or inject more magics for auto-conversion) to be able to write like *CFG.SYSTEM.LIST_CONFIG_OPTION.

 cfg.add(config.Scope.applicationOverride, *CFG.system.list_config_option, args.configuration) 
 cfg.add(config.Scope.applicationOverride, *CFG.system.list_max_results, args.limit) 
 cfg.add(config.Scope.applicationOverride, *CFG.system.admin_track, args.track) 
 cfg.add(config.Scope.applicationOverride, *CFG.system.list_races_benchmark_name, args.benchmark_name) 
 cfg.add(config.Scope.applicationOverride, *CFG.system.list_from_date, args.from_date) 
 cfg.add(config.Scope.applicationOverride, *CFG.system.list_races_to_date, args.to_date) 
--- a.py        2023-10-26 19:08:49.359047597 +0900
+++ b.py        2023-10-26 19:09:01.960758334 +0900
@@ -1 +1 @@
- cfg.add(config.Scope.applicationOverride, "system", "list.config.option", args.configuration)
+ cfg.add(config.Scope.applicationOverride, *CFG.system.list_config_option, args.configuration)

There'd be other ways, like using dataclass or some non-standard libraries, but I thought this is probably good enough not to overkill.

dliappis commented 1 year ago

cc @elastic/es-perf

gbanasiak commented 1 year ago

@sakurai-youhei Many thanks for your interest in making Rally better, feel free to assign this issue to yourself and contribute.

I've looked at the proposal based on UserDict. My understanding is the benefit of UserDict is the following usage: CFG.section_a["dot.notation.key"] but is this a requirement at all? I think all we need is to refer to config keys as class attributes, so perhaps there's a room for making it simpler? To be honest it took me a while to figure out what your code is doing.

I like the idea of returning tuples and unpacking them in cfg API calls. I don't like the _section repetition in the section class code.

I was thinking about something as simple as this:

class Section:
    def __init__(self, name):
        self._name = name

    def __getattribute__(self, item):
        # object.__getattribute__ required instead of self.__getattribute__ to prevent an infinite loop
        return object.__getattribute__(self, "_name"), object.__getattribute__(self, item)

class System(Section):
    def __init__(self):
        super().__init__("system")

    # description ...
    list_config_option = "list.config.option"
    # description ...
    list_max_results = "list.max_results"

class Root:
    system: System = System()

CFG = Root()
print(CFG.system.list_config_option)
print(CFG.system.list_max_results)
#print(CFG.system.does_not_exist) <--- error

with a simple path to . removal (breaking change with config file version increment):

class Section:
    def __init__(self, name):
        self._name = name

    def __getattribute__(self, item):
        # object.__getattribute__ required instead of self.__getattribute__ to prevent an infinite loop
        return object.__getattribute__(self, "_name"), item

class System(Section):
    def __init__(self):
        super().__init__("system")

    # description ...
    list_config_option = None
    # description ...
    list_max_results = None

I'm also curious what was your idea with the dataclass?

To move forward I would suggest to start working on a single section (the smallest, the better) to see how it goes.

Edit: Changed super() to object in __getattribute__ and added a comment.

sakurai-youhei commented 1 year ago

@gbanasiak Thank you for your comment.

I have played my idea, your idea, dataclass, etc., more to see what would be the best, and I find enum can be used to express dot-notation straightforwardly like the PR https://github.com/elastic/rally/pull/1795.

I hope you like it, as I have also incorporated the concept of avoiding repetition and giving descriptions.

sakurai-youhei commented 1 year ago

P.S. I have come to the conclusion that the literal check like this PR #1798 should come first before DRY, refactoring, auto-completion, etc.

https://github.com/elastic/rally/actions/runs/6773039463/job/18406946647?pr=1798#step:5:202 image

gbanasiak commented 12 months ago

@sakurai-youhei I like this approach. I'll leave more detailed comments in each PR.