embench / embench-iot

The main Embench repository
https://www.embench.org/
GNU General Public License v3.0
259 stars 105 forks source link

Design rationale for handling of --text (etc) options in benchmark_size.py #123

Closed Roger-Shepherd closed 3 years ago

Roger-Shepherd commented 3 years ago

I'm looking at how to modify benchmark_size.py to handle macho files as well as elf files. In addition to the processing of the files, works needs to be done on the parsing of the parameters.

In the current implementation the handling and logging is inconsistent between parameters. For example, if the script is run with no parameters given, the log says:

Supplied arguments

--builddir : bd --logdir : logs --baselinedir : baseline-data --absolute : False --output-format : None --json-comma : False --text : [] --data : [] --rodata : [] --bss : [] --metric : []

So, for example, --builddir shows the default for the parameter, but --text shows what was actually supplied.

The reason for this is that the handling of parameters is split between build_parser and validate_args with the logging occurring between the two:

Parse arguments using standard technology

parser = build_parser() args = parser.parse_args()

# Establish logging
setup_logging(args.logdir, 'size')
log_args(args)

# Check args are OK (have to have logging and build directory set up first)
validate_args(args)

The handling of many of the parameters is done by the python paring library but for some (e.g. --text) is done in validate_args. Specifically for --text the relevant part of build_parser is

List arguments are empty by default, a user specified value then takes

precedence. If the list is empty after parsing, then we can install a

default value.

parser.add_argument( '--text', type=str, default=[], nargs='+', help='Section name(s) containing code' )

and of validate_args is:

Sort out the list of section names to use

gp['secnames'] = dict()

for argname in ['text', 'rodata', 'data', 'bss']:
    secnames = getattr(args, argname)
    if secnames:
        gp['secnames'][argname] = secnames
    else:
        gp['secnames'][argname] = ['.' + argname]

My understanding of how add_argument works is that

parser.add_argument( '--text', type=str, default=[.text'], nargs='+', help='Section name(s) containing code' )

and

Sort out the list of section names to use

gp['secnames'] = dict()

for argname in ['text', 'rodata', 'data', 'bss']:
    secnames = getattr(args, argname)
    gp['secnames'][argname] = secnames

would cause the same processing of the --text parameter to occur but the logging would show the defaults rather than what was on the commend line.

Is there a reason for this? My guess is that the command line parser is so complicated that it seemed easier to do it afterwards, but perhaps there is another reason. I'd like to know before I start making modifications to handle macho.

On reflection: Having looked more closely at the implications of modifying either structure, the original seems better in that, with the proposal I am following for supporting macho, the only change in build_parser is the addition of a --format parameter. All the knowledge of standard section names can be deferred to validate_args without the need for build_parser to have to understand the chosen file format.

Roger-Shepherd commented 3 years ago

Dealt with by #132