edornd / argdantic

Typed command line interfaces with argparse and pydantic
MIT License
38 stars 4 forks source link

Allow subgroup argument selection via Enum #37

Open janvainer opened 11 months ago

janvainer commented 11 months ago

Is your feature request related to a problem? Please describe.

Hi, first of all, this project has a great potential for ML project configuration! Well done <3!!! There is one usecase that is quite common in ML. It is when you have two different sub-configurations and you want to easily switch between them and then also specify certain sub-options. For example, consider a ML training script where you want to be able to select different optimizers:

class Optimizer(BaseModel):
    lr: float = 0.001
    eps: float = 1e-7

class Adam(Optimizer):
    lr: float = 0.003

class SGD(Optimizer):
    lr: float = 0.002

class Config(BaseModel):
    optimizer: Adam | SGD = Adam()

Now it would be awesome to somehow specify which optimizer to initialize in the config and also be able to set some of its parameters.

Describe the solution you'd like How the CLI should look:

python train.py --optimizer adam --optimizer.lr 0.01  # specify the optimizer type and then also specify some of its params

There may be an issue regarding naming of the arguments based on the Union type. Instead, perhaps enums could be used:

class Optimizers(Optimizer, Enum):
    adam = Adam()
    sgd = SGD()
    sgd_custom = SGD(lr=1.0)

class Config(BaseModel):  # this gets parsed by argdantic into a cli later
    optimizer: Optimizers = Optimizers.adam

The CLI would have to check that the enum value itself is a BaseModel and treat it as a nested configuration node to be displayed and available in terminal.

Describe alternatives you've considered Hydra allows this kind of sub-grouping via subfolders. Simple-parsing solves it via subgroups type. Unfortunately, none of them are built on pydantic, so the user has to take care of validation themselves.

WDYT about the feature? It would allow quite complex configurations, for example:

python train.py \
    --optimizer adam | sgd \
    --optimizer.lr 0.1 \
    --encoder lstm | conv \
    --encoder.channels 512

Edit: after a bit of thought, perhaps the Annotated type could be better suited 🤔 It would be something like

class Config(BaseModel):  # this gets parsed by argdantic into a cli later
    optimizer: Annotated[Optimizer, argdantic.Subgroups(adam=Adam(), sgd=SGD), sgd_custom=SGD(lr=1.0))] = Adam()

The advantage is that this can be used outside of CLI world without issues - one would be able to initialize Config without the need to import the enum class. It would be simply Config(optimizer=SGD(0.1)) instead of Config(optimizer=Optimizers.SGD). Another advantage is that it would be possible to pass optimizer config that is not pre-defined in the Enum.

edornd commented 11 months ago

Hey @janvainer! Thank you for the kind words! I'm glad this tiny library has been helpful, the objective (well, at least mine) was exactly to provide a composable configuration for ML purposes. I see your point and it would indeed be helpful, but it will require some thought for the implementation 🤔. The major problem I see here (if I understood correctly) is that the arguments would be defined by the choice of the parent, which is not known at creation time.

From the top of my head, I think it might be feasible by considering a Union of BaseModel as a multi-choice argument, at least for the root arg (i.e., --optimizer=adam), but I don't have a clean solution for subargs (--optim.lr=...).

I'll give it a try soon though! In the meantime, I was using something like this for this very purpose:

from enum import Enum

from argdantic import ArgParser

# imagine using torch optimizers
# there should be a base class for this
class Optimizer:
    def __init__(self, name: str, lr: float):
        self.name = name
        self.lr = lr

class SGD(Optimizer):
    def __init__(self, lr: float):
        super().__init__("SGD", lr)

class Adam(Optimizer):
    def __init__(self, lr: float):
        super().__init__("Adam", lr)

# define an Enum where the values are the classes
# or a partial(Class, fixed arguments)
class Optimizers(Enum):
    sgd = SGD
    adam = Adam

cli = ArgParser()

# make the user select the class, then use other arguments
# to define the input parameters
@cli.command()
def main(
    optimizer: Optimizers = Optimizers.sgd,
    lr: float = 0.01,
    epochs: int = 10,
    batch_size: int = 32,
):
    print(optimizer.value(lr))
    print(lr)
    print(epochs)
    print(batch_size)

if __name__ == "__main__":
    cli()

I agree that this is a workaround, but at least allows this sort of mechanism to an extent. I'll see what can be done!

janvainer commented 11 months ago

Thank you for your response and the code! Yes, it is a bit cumbersome. Please keep me in the loop! ;) I am curious how this unfolds.