Dead? - Githubissues

dlwh commented 1 year ago

Hey,

I really like Pyrallis, and I've been using it happily since it came out. I did run into a place where I needed a new feature (subclass registries), that spiraled into a bunch of other issues. Some of it's kinda big.

Are you open to patches/new features at this point or are you doing other things? (No shame, I run out of time/energy/context to work on things too). I might fork the project if you're done with it...

lebrice commented 1 year ago

Hi @dlwh, I'm not a Pyrallis dev, I work on Simple-Parsing (which Pyrallis is largely based on). What kind of feature do you need? I'd be happy to discuss this in a new issue over on the Simple-Parsing repo if things don't end up working for you over here.

dlwh commented 1 year ago

Hi @lebrice , Thanks for reaching out!

I'll post the things I'd like added in pyrallis (since they would apply here) and we can discuss over in simple-parsing if it's of interest. I didn't realize simple-parsing was so feature-complete? I had thought it didn't do yaml loading, which is a key feature I need, but it seems to.

The biggest thing I need atm is support for dynamically selected subclasses: e.g. --model gpt --model.attn_pdrop 0.1, where not every model necessarily has attn_pdrop. I have something rigged up that sort of works in pyrallis atm, except that it doesn't work with the argparser. This is something that yahp supported, but it's dead now, and the registry couldn't be dynamic in the way I would like. There's a lot of fallout from that that will complicate the implementation a lot (especially if you want help generation to be responsive to the subtypes)

I'd also like to add support for yamls including other somehow. You can get pyyaml to do it, but the stuff needs to be threaded through.

There's other less pressing stuff that's more just bugs or limitations:

if a dataclass has a custom decode/encode registered, sometimes pyrallis still uses the default dataclass encoding
the argparser uses sys.exit, which is vaguely annoying
Lists of dataclasses aren't supported (i.e. List[MyConfig])
pyrallis's bool decoder is bool(x), so it accepts anything truthy. This makes it impossible to use Union[bool, str]
I needed to add support for pulling config files from fsspec, unclear if that's a broadly useful feature

eladrich commented 1 year ago

Thanks @dlwh, these are some really good points! Pyrallis is still alive and we actively use it in many projects, but it is true we haven't added new features in a while, partially due to lack of time and partially because we really want to keep pyrallis a simple library that's easy to get into (where many great alternatives such as @lebrice's Simple-Parsing exist for more complex features).

Actually, both dynamic subclasses and yamls that include other yamls are things we discussed and couldn't come up with a solution that we liked well enough to add it to pyrallis. So I'm very interested in seeing what you came up with for the subclasses.

For the YAMLs we are thinking of using "config_path" as a reserved word in all levels of the yaml, so that if config_path exists in a yaml file we will try to load from it first and populate accordingly, but again couldn't decide if that's a good way to solve that.

As to the rest of the points, we would be happy to accept pull requests for most of them if you are interested.

dlwh commented 1 year ago

Ok cool! I saw stale issues and (more worryingly) a lack of closed issues and feared the worst...

(@lebrice , Simple-Parsing looks cool, and if I weren't now pretty invested in pyrallis and pyrallis was indeed dead, I'd be interested in jumping ship, but devil you know...)

Subtypes

Currently my syntax for subtypes is

model:
   gpt2:
      # gpt2 stuff

(I'm happy to bikeshed that syntax as much as you want. I considered type: gpt2 as well, which is also good)

and in python you write

@config_registry()
class ModelConfig(abc.ABC):
   pass

@ModelConfig.register_subclass("gpt2")
@dataclass(frozen=True)
class Gpt2Config(ModelConfig):
   attn_pdrop: float = 0.0

There's actually a rudimentary plugin discovery thing I added too, but that's a detail.

The big challenge is getting argparse to accept --model.gpt2.attn_pdrop, or whatever. To do it really right (where you can tell it the type and then get a help for that type, requires quite a bit of doing with argparse...

Inclusion

My preference would be to use the standard YAML extension machinery, which PyYAML actually supports: https://stackoverflow.com/a/9577670/1736826 There's even a library https://pypi.org/project/pyyaml-include/

It's also possible to do "inheritance" I think, but it's a bit of effort... WDYT?

eladrich commented 1 year ago

One thing I couldn't figure out about subtypes is that it kind of kills your ability to get static analysis of your dataclass so you don't really they code completion / type errors. What are your thoughts about that?

dlwh commented 1 year ago

Eh, i mean, a pattern I typically use with configs is the builder pattern, they're kind of made for one another.

So my actual example looks more like:


class Model(abc.ABC):
  pass

@config_registry()
class ModelConfig(abc.ABC):
   @abc.abstractmethod
   def build() -> Model:
      pass

@ModelConfig.register_subclass("gpt2")
@dataclass(frozen=True)
class Gpt2Config(ModelConfig):
   attn_pdrop: float = 0.0 

   def build() -> Gpt2Model:
      pass

So you get static type checking via the built type. Sure you don't get access to unshared properties via a type checker, but that's the usual story with inheritance. Yahp employs this liberally, or did before it died.

lebrice commented 1 year ago

Is the goal of this subtype registration to be able to choose between different subclasses to parse from the command line? If so, this is probably what you're looking for @dlwh :

https://github.com/lebrice/SimpleParsing/blob/master/examples/subgroups/subgroups_example.py

This "subgroups" feature takes a different approach, but I believe you get the same end result. This also works with serialization/deserialization. In my opinion, it also feels a bit more explicit than the builder pattern.

lebrice commented 1 year ago

Also, w.r.t. the builder pattern you showed @dlwh, since you have to import those classes anyway for them to be registered, you might as well do something like:

{cls.__qualname__.lower().removesuffix("config"): cls for cls in ModelConfig.subclasses()}

But then, since all the config classes are in the current scope, you might as well have them in a dict explicitly IMO.

eladrich commented 1 year ago

@dlwh I understand the need and idea behind this pattern, but I've been discussing this with @yairf11 and @idow09 and I think that adding this form of subtype support isn't something we want to add to pyrallis, kind of breaking the simplicity we were aiming for. Could be that @lebrice's solutions are more suitable for these needs

dlwh commented 1 year ago

Ooh, I didn't notice that. That might be enough for my needs. I do wish that the --help would be tuned based on the model type, but that's a nice to have.

You actually can get away without the imports stuff by using a plugin mechanism. I left that out because I could probably retrofit it on top of something that supported registration. In particular, in my implementation I have:

    A decorator to register a config class with a registry that we can use to find the config class for a given
    config. This is used for abstract classes/interfaces where we want to select a concrete implementation based on
    the config. Subclasses can be added with the `register_subclass` method.

    We have a rudimentary package discovery system.
    If discover_packages is not None, then we will attempt to identify subclasses by importing
    packages from discover_packages. They should still be registered with register_subclass.
    If you use discover_packages, you should register a class with the same name as the package.
    For example, if you make a new LmConfig, which has discover_packages="levanter.models" and your
    model is defined in my_transformer.py, then you should register your config with
    `LmConfig.register_subclass("my_transformer", MyTransformerConfig)`

    Usage:

    @config_registry
    @dataclasses.dataclass
    class ModelConfig:
        pass

    @dataclasses.dataclass
    class GPTConfig(ModelConfig):
        pass

    ModelConfig.register_subclass("gpt", GPTConfig)

    the syntax for a yaml file would be:

    model:
      gpt:
         <config for gpt>

    As a special case we also allow just using a string if you want defaults:

    model: gpt

dlwh commented 1 year ago

@eladrich fair enough! I'll see if I can switch over to simple-parsing I guess

eladrich / pyrallis

Dead? #28

Subtypes

Inclusion