comonicon / Comonicon.jl

Your best CLI generator in JuliaLang
https://comonicon.org
MIT License
282 stars 25 forks source link

Reproducible configuration files/CLI args like ArgParse.jl and jsonargparse #225

Closed jlperla closed 2 years ago

jlperla commented 2 years ago

Comonicon has support for an impressive number of features, but is missing a crucial feature for reproducibility: support for config options setup in files (with overwriting on commandline).

The usecases for this is pretty simple: lots of CLIs (especially those related to machine learning) have a lot of different configuration options like hyperparameter settings, various parameters for priors, etc. Often dozens. There is often no natural set of "default" values because sensible defaults may be interrelated, and you may be tweaking those parameters as you go through - making one or two changes in parameters at a time. Furthermore, if you use a bunch of defaults you may have no idea in various versions of your experiments which options you used at that time. Not only that, even if you knew the exact set of defaults for a particular commit, you can't reproduce the results of a CLI call easily because you need to go back and reconstruct the set of defaults and parameters.

The way they did this in python ecosystems (and which we can emulate crudely in Argpase.jl) is the following:

  1. A way to pass in a configuration file as a set of defaults to override. Any additional values the user passes on the commandline overrules these defaults.
  2. As in jsonargparse/ArgParse.jl it would be great to have both support for multiple config files and/or support for setting a "default" one. This lets you do the following:
  3. Finally, it is great if there is a relatively simple workflow which can save out the whole CLI arguments to reproduce a call---even if default arguments/etc. were to change.
    • This works well with the previous issue of stacking files because you can save out the config file with your experiment, then call the experiment again just passing in that config file and it reproduces the entire call. If the default values of the CLI changed or if someone wanted to just see what hyperparameter values were set for a particular experiment, then it is all there.
    • There is no simple way to save things out from ArgParse, sadly, but this is all over the jsonargparse world. In part it is easy because it just needs to persist the json.

      Hopefully that gives a sense of the workflow. This is not just for minimal CLI terminal apps where I think your current feature set is more than sufficient, but rather for things related to reproducible experiments. Let me give a really simple strawman for how it could work with comonicon (not suggesting this is a good interface, but it seems to capture the minimum software requirements using a variation on the ArgParse.jl approach).

    • Add in special comonicon_argfile_default reserved kwarg for the @main function. This establishes a default file to load up prior to applying any CLI args. Maybe you specify a function like
      @main function sum(x, y; precision::String="float32", fastmath::Bool=false, comonicon_argfile_default="my_path_to_default" )
    • Add in special @argfile_save CLI argument. This gives a filename to save the end results of the arguments that would be passed directly to the julia function. The key is that whatever format this exports as, it would be compatible with passing in argiles.
    • Finally, have a reserved CLI argument like @argfile mypath/somefile.txt or whatever which the user can pass in to pass in pointers to these files.

The logic would be as follows:

  1. Take any of the julia default arguments from the commandline spec.
  2. If comonicon_argfile_default is not empty, then push the results of that into an @argfile argument on that stack of parameters , overwriting those before it.
  3. Any other @argfile arguments are passed in, then apply them overtop in order. Or just support a single one if that is easier.
  4. Then apply the passed in CLI args
  5. Finally, if the user had the special @argfile_save then save out the exact CLI arguments as an @argfile compatible file which would replicate calling that function. With this, something like the following might work
    
    julia myscript.jl @argfile_save dir1/my_experiment.txt --arg2 5.0 

wait perhaps months later, and with a code base which may have even changed its julia default args:

Reproduce experiment but maybe overwrite an arg or two.

julia myscript.jl @argfile dir1/my_experiment.txt --arg2 4.0

Roger-luo commented 2 years ago

Thanks for the feature request (I assume this should be FR rather than bug report?) I have to admit, I'm a bit confused after reading this proposal, what's the difference between this and the option types support-based SimpleConfigs? https://github.com/comonicon/Comonicon.jl/pull/222

IIUC, the only thing you want is to save your CLI inputs? Why not just do something like git config that mutates a local file to save it inside your CLI function? How does this relate to a CLI syntax? e.g what's the difference between just using --config-file=<path> and @<path>? Adding a new syntax @ just for this seems to be overkill to me.

I think @avik-pal might have something to say about this, I'm quite confused on this design. Tbh I do tweak hyperparameters too, but I don't really use this workflow. I have a private implementation that fuses a different range of hyperparameters into multiple tasks and distributes them on the cluster in parallel. So I mainly just read in multiple config files that contain the range I want to tweak and also keep those config files as logs.

Add in special comonicon_argfile_default reserved kwarg for the @main function. This establishes a default file to load up prior to applying any CLI args. Maybe you specify a function like

The option type feature is much more flexible than this?

Any other @argfile arguments are passed in, then apply them overtop in order. Or just support a single one if that is easier.

This does not seem to require a compiler/codegen support you can different do it in the function that Comonicon @casted? except for the syntax, if we are talking about automatically letting the config file feed into the same option to overlap, e.g you can have

mycmd --config=config1.toml --config=config2.toml --config=config3.toml

is equivalent to

d1 = from_toml(MyOptionType, "config1.toml)
merge!(d1, from_toml(MyOptionType, "config2.toml"))
merge!(d1, from_toml(MyOptionType, "config3.toml"))

I haven't added a test for something like this, but I think this should work in Comonicon now, otherwise, it should be considered a bug.

Then apply the passed in CLI args

See https://github.com/comonicon/Comonicon.jl/pull/223

Finally, if the user had the special @argfile_save then save out the exact CLI arguments as an @argfile compatible file which would replicate calling that function.

Why not just have a --save flag? I still don't see why @ is necessary. We want to keep the syntax minimal so people don't have to learn a lot.

you can definitely have

@cast function mycmd(;config::MyOptionType, save::Bool=false)
    if save
       to_toml(a_file_path_to_save, config)
    end
end

but I can see, here we will be missing a file path information if the file path is fed as --config=<path>, perhaps we should ask people to always use a wrapper type as the type of config files, e.g

struct ConfigType{T}
   schema::T
   meta::Dict{String, Any}
end

the parser actually saves the information for conversion while parsing https://github.com/Roger-luo/Configurations.jl/blob/master/src/from_toml.jl#L13 it's just not exposed to the users.

Also, I'm a bit concerned about mutating a local config file for numerical experiments - shouldn't we consider all the different configurations as immutable and save all the configs so we can trace back previous experiments by time/tags etc.?

jlperla commented 2 years ago

@Roger-luo Thanks for the response. Sorry if it didn't come across the right way, but I had said (not suggesting this is a good interface, but it seems to capture the minimum software requirements using a variation on the ArgParse.jl approach). and am definitely not suggesting this as an interface - just that it was the way we had figured out how to replicate it with ArgParse features which fulfilled the use-cases.

I think the big issue here seems to me that I didn't understand the role of https://github.com/avik-pal/SimpleConfig.jl from @avik-pal and your configuraitons stuff. The example in https://comonicon.org/dev/conventions/#Working-with-option-types-from-Configurations had all sorts of hidden goodness behind it.

Let me make a few responses and then if you feel that there already are sufficient features for this workflow we should close this issue.

Thanks for the feature request (I assume this should be FR rather than bug report?) Yes. Didn't see a `feature request' option on https://github.com/comonicon/Comonicon.jl/issues/new/choose but maybe it is easy to add one for you if this keeps happening.

Also, I'm a bit concerned about mutating a local config file for numerical experiments - shouldn't we consider all the different configurations as immutable and save all the configs so we can trace back previous experiments by time/tags etc.?

For sure, no mutation. Had just stripped down the example too far and was just trying to point out there needs to be a way to save it, and that config should be directly includable at some other time. No encourgement of overwriting of course.

Why not just have a --save flag? I still don't see why @ is necessary. We want to keep the syntax minimal so people don't have to learn a lot. ... but I can see, here we will be missing a file path information if the file path is fed as --config=, perhaps we should ask people to always use a wrapper type as the type of config files, e.g

Now that I am starting to understand the tight connection to the config files and types, I think you may have everything you need right now. If we add in a custom save flag in the MyOptionType as well as the a_file_path_to_save in it, then we could just do

@option struct ExperimentConfig
  global_seed::Int64 = 0
  lr::Float64 = 1e-5
  save_config_path::string = ""
end
@cast function mycmd(;config::ExperimentConfig)
    if config.save_config_path != ""
       to_toml(config.save_config_path config)
    end
    # code
    @show config.global_seed
end

Is that right? Then I should be able do things like the following:

# use almost all defaults, and save results.
command --config.save_config_path=./experiment_1.toml --global_seed=1

# Loads up ALL of those from previous experiment, overwrite lr and run
command --config=./experiment_1.toml --config.lr=1e-8  

# Loads up ALL of those from previous experiment, overwrite lr and run
command --config=./experiment_1.toml --config.save_config_path=./experiment_2.toml  --config.lr=1e-8  

# Then, six months later I can run the following, and even if the defaults in `ExperimentConfig` changed, this would reproduce parameters?
command --config=./experiment_2.toml  

It that all works, then I think this workflow is fulfilled completley. Might be worth a doc addition at some point perhaps if this is useful for others.

Roger-luo commented 2 years ago

Is that right? Then I should be able do things like the following:

Yes, exactly. The only thing that might be missing is pass through the actually file path into the function, which I don't have a good idea for the interface yet.

It that all works, then I think this workflow is fulfilled completley. Might be worth a doc addition at some point perhaps if this is useful for others.

Yeah, I was hoping to rewrite and polish the docs at some point... some part of the docs are out of date and was written in 0.10.0. Any help on doc improvement is definitely welcome.

jlperla commented 2 years ago

Amazing! Sorry I missed this workflow in the docs. I think this covers everything. I may come back to this during a rewrite in a few months on the CLI stuff for a project now that ArgParse can be exorcized completely.

Yes, exactly. The only thing that might be missing is pass through the actually file path into the function, which I don't have a good idea for the interface yet.

For sure. I also think that it can have a few manual steps required if you had an example to follow.

I am going to close this since the short answer is that you already have this functionality.