Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
607 stars 77 forks source link

Move away from configs, to just passing args to the class #246

Closed SeanNaren closed 2 years ago

SeanNaren commented 2 years ago

🚀 Feature

Currently, the process of creating a module looks like such:

...
dm = TextClassificationDataModule(
    cfg=TextClassificationDataConfig(
        batch_size=1,
        dataset_name="emotion",
        max_length=512,
    ),
    tokenizer=tokenizer,
)

This is fine for the "general" cases, however for the specific modules for datasets, it looks cumbersome:

...
dm = WMT16TranslationDataModule( # I have to choose the WMT16 class
    cfg=TranslationDataConfig(
        dataset_name="wmt16", # I have to pass the name of the dataset as well? why is this not the default?
        ...
    ),
    tokenizer=tokenizer,
)

A solution would be to introduce a config class like below, however this adds even more lines.

...
@dataclass
class WMT16TranslationDataConfig:
    dataset_name: str = "wmt16"

I suggest we opt to remove configs entirely, and just rely on the modules:

dm = TextClassificationDataModule(
    batch_size=1,
    dataset_name="emotion",
    max_length=512,
    tokenizer=tokenizer,
)
dm = WMT16TranslationDataModule(tokenizer=tokenizer) # dataset_name="wmt16" is default!

From my understanding, this is supported by hydra, but all cfg objects will be converted into parameters.