Closed dlwh closed 1 year ago
Hi @dlwh, I'm not a Pyrallis dev, I work on Simple-Parsing (which Pyrallis is largely based on). What kind of feature do you need? I'd be happy to discuss this in a new issue over on the Simple-Parsing repo if things don't end up working for you over here.
Hi @lebrice , Thanks for reaching out!
I'll post the things I'd like added in pyrallis (since they would apply here) and we can discuss over in simple-parsing if it's of interest. I didn't realize simple-parsing was so feature-complete? I had thought it didn't do yaml loading, which is a key feature I need, but it seems to.
The biggest thing I need atm is support for dynamically selected subclasses: e.g. --model gpt --model.attn_pdrop 0.1
, where not every model necessarily has attn_pdrop
. I have something rigged up that sort of works in pyrallis atm, except that it doesn't work with the argparser. This is something that yahp supported, but it's dead now, and the registry couldn't be dynamic in the way I would like. There's a lot of fallout from that that will complicate the implementation a lot (especially if you want help generation to be responsive to the subtypes)
I'd also like to add support for yamls including other somehow. You can get pyyaml to do it, but the stuff needs to be threaded through.
There's other less pressing stuff that's more just bugs or limitations:
List[MyConfig]
)bool(x)
, so it accepts anything truthy. This makes it impossible to use Union[bool, str]
Thanks @dlwh, these are some really good points! Pyrallis is still alive and we actively use it in many projects, but it is true we haven't added new features in a while, partially due to lack of time and partially because we really want to keep pyrallis a simple library that's easy to get into (where many great alternatives such as @lebrice's Simple-Parsing exist for more complex features).
Actually, both dynamic subclasses and yamls that include other yamls are things we discussed and couldn't come up with a solution that we liked well enough to add it to pyrallis. So I'm very interested in seeing what you came up with for the subclasses.
For the YAMLs we are thinking of using "config_path" as a reserved word in all levels of the yaml, so that if config_path exists in a yaml file we will try to load from it first and populate accordingly, but again couldn't decide if that's a good way to solve that.
As to the rest of the points, we would be happy to accept pull requests for most of them if you are interested.
Ok cool! I saw stale issues and (more worryingly) a lack of closed issues and feared the worst...
(@lebrice , Simple-Parsing looks cool, and if I weren't now pretty invested in pyrallis and pyrallis was indeed dead, I'd be interested in jumping ship, but devil you know...)
Currently my syntax for subtypes is
model:
gpt2:
# gpt2 stuff
(I'm happy to bikeshed that syntax as much as you want. I considered type: gpt2
as well, which is also good)
and in python you write
@config_registry()
class ModelConfig(abc.ABC):
pass
@ModelConfig.register_subclass("gpt2")
@dataclass(frozen=True)
class Gpt2Config(ModelConfig):
attn_pdrop: float = 0.0
There's actually a rudimentary plugin discovery thing I added too, but that's a detail.
The big challenge is getting argparse to accept --model.gpt2.attn_pdrop, or whatever. To do it really right (where you can tell it the type and then get a help for that type, requires quite a bit of doing with argparse...
My preference would be to use the standard YAML extension machinery, which PyYAML actually supports: https://stackoverflow.com/a/9577670/1736826 There's even a library https://pypi.org/project/pyyaml-include/
It's also possible to do "inheritance" I think, but it's a bit of effort... WDYT?
One thing I couldn't figure out about subtypes is that it kind of kills your ability to get static analysis of your dataclass so you don't really they code completion / type errors. What are your thoughts about that?
Eh, i mean, a pattern I typically use with configs is the builder pattern, they're kind of made for one another.
So my actual example looks more like:
class Model(abc.ABC):
pass
@config_registry()
class ModelConfig(abc.ABC):
@abc.abstractmethod
def build() -> Model:
pass
@ModelConfig.register_subclass("gpt2")
@dataclass(frozen=True)
class Gpt2Config(ModelConfig):
attn_pdrop: float = 0.0
def build() -> Gpt2Model:
pass
So you get static type checking via the built type. Sure you don't get access to unshared properties via a type checker, but that's the usual story with inheritance. Yahp employs this liberally, or did before it died.
Is the goal of this subtype registration to be able to choose between different subclasses to parse from the command line? If so, this is probably what you're looking for @dlwh :
https://github.com/lebrice/SimpleParsing/blob/master/examples/subgroups/subgroups_example.py
This "subgroups" feature takes a different approach, but I believe you get the same end result. This also works with serialization/deserialization. In my opinion, it also feels a bit more explicit than the builder pattern.
Also, w.r.t. the builder pattern you showed @dlwh, since you have to import those classes anyway for them to be registered, you might as well do something like:
{cls.__qualname__.lower().removesuffix("config"): cls for cls in ModelConfig.subclasses()}
But then, since all the config classes are in the current scope, you might as well have them in a dict explicitly IMO.
@dlwh I understand the need and idea behind this pattern, but I've been discussing this with @yairf11 and @idow09 and I think that adding this form of subtype support isn't something we want to add to pyrallis, kind of breaking the simplicity we were aiming for. Could be that @lebrice's solutions are more suitable for these needs
Ooh, I didn't notice that. That might be enough for my needs. I do wish that the --help
would be tuned based on the model type, but that's a nice to have.
You actually can get away without the imports stuff by using a plugin mechanism. I left that out because I could probably retrofit it on top of something that supported registration. In particular, in my implementation I have:
A decorator to register a config class with a registry that we can use to find the config class for a given
config. This is used for abstract classes/interfaces where we want to select a concrete implementation based on
the config. Subclasses can be added with the `register_subclass` method.
We have a rudimentary package discovery system.
If discover_packages is not None, then we will attempt to identify subclasses by importing
packages from discover_packages. They should still be registered with register_subclass.
If you use discover_packages, you should register a class with the same name as the package.
For example, if you make a new LmConfig, which has discover_packages="levanter.models" and your
model is defined in my_transformer.py, then you should register your config with
`LmConfig.register_subclass("my_transformer", MyTransformerConfig)`
Usage:
@config_registry
@dataclasses.dataclass
class ModelConfig:
pass
@dataclasses.dataclass
class GPTConfig(ModelConfig):
pass
ModelConfig.register_subclass("gpt", GPTConfig)
the syntax for a yaml file would be:
model:
gpt:
<config for gpt>
As a special case we also allow just using a string if you want defaults:
model: gpt
@eladrich fair enough! I'll see if I can switch over to simple-parsing I guess
Hey,
I really like Pyrallis, and I've been using it happily since it came out. I did run into a place where I needed a new feature (subclass registries), that spiraled into a bunch of other issues. Some of it's kinda big.
Are you open to patches/new features at this point or are you doing other things? (No shame, I run out of time/energy/context to work on things too). I might fork the project if you're done with it...