Closed dongheehand closed 2 years ago
Another error occur when training with hydra. I fixed my code to convert the DictConfig to DataConfig when creating DataModule.
HYDRA_FULL_ERROR=1 python train.py task=nlp/language_modeling dataset=nlp/language_modeling/wikitext trainer.devices=1 training.batch_size=8
results = self._run(model, ckpt_path=self.ckpt_path)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning-1.6.3-py3.7.egg/pytorch_lightning/trainer/trainer.py", line 1215, in _run
self.strategy.setup(self)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning-1.6.3-py3.7.egg/pytorch_lightning/strategies/ddp.py", line 155, in setup
super().setup(trainer)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning-1.6.3-py3.7.egg/pytorch_lightning/strategies/strategy.py", line 139, in setup
self.setup_optimizers(trainer)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning-1.6.3-py3.7.egg/pytorch_lightning/strategies/strategy.py", line 129, in setup_optimizers
self.lightning_module
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning-1.6.3-py3.7.egg/pytorch_lightning/core/optimizer.py", line 180, in _init_optimizers_and_lr_schedulers
optim_conf = model.trainer._call_lightning_module_hook("configure_optimizers", pl_module=model)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning-1.6.3-py3.7.egg/pytorch_lightning/trainer/trainer.py", line 1593, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/data/private/lightning-transformers_3/lightning-transformers/lightning_transformers/core/model.py", line 105, in configure_optimizers
scheduler = self.instantiator.scheduler(self.scheduler_cfg, self.optimizer)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1178, in __getattr__
type(self).__name__, name))
AttributeError: 'LanguageModelingTransformer' object has no attribute 'optimizer'
TaskTransformer has no attribute 'optimizer'. But attribute 'optimizer' is used. Please refer 105-th line
in lightning_transformers/core/model.py
(https://github.com/PyTorchLightning/lightning-transformers/blob/master/lightning_transformers/core/model.py#L105)
scheduler = self.instantiator.scheduler(self.scheduler_cfg, self.optimizer)
should be
scheduler = self.instantiator.scheduler(self.scheduler_cfg, optimizer)
Hi @dongheehand thank you for the super informative debug! I have a PR opened for the issue with the optimizer thank you.
Out of curiosity, would you be interested in dropping the Hydra portion to just use the classes directly? We're slowly moving towards this with #223 #246 and would be curious if the class-based approach could be useful for you. For the language modeling task this can be done via a script:
import pytorch_lightning as pl
from transformers import AutoTokenizer
from lightning_transformers.task.nlp.language_modeling import (
LanguageModelingDataConfig,
LanguageModelingDataModule,
LanguageModelingTransformer,
)
if __name__ == "__main__":
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="gpt2")
model = LanguageModelingTransformer(pretrained_model_name_or_path="gpt2")
dm = LanguageModelingDataModule(
cfg=LanguageModelingDataConfig(
batch_size=1,
dataset_name="wikitext",
dataset_config_name="wikitext-2-raw-v1",
),
tokenizer=tokenizer,
)
trainer = pl.Trainer(accelerator="auto", devices="auto", max_epochs=1)
trainer.fit(model, dm)
Eventually we'll get rid of the configs as well, making it even simpler.
Thank you! If i have a opinion about that, i will comment in https://github.com/PyTorchLightning/lightning-transformers/issues/223 .
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
Run the following command
Error msg
Expected behavior
The training begins
The cause of error
The cause of error is that
cfg
argument in DataModule class (such asTransformerDataModule
) is not DataConfig class (such asTransformerDataConfig
) but DictConfig class when training with hydra. DictConfig instance do not have the hyper-parameter which is not specified in config file. The error occurs when DataModule is created by instantiate function. (Please referlightning_transformers/core/instantiator.py
)I think https://github.com/PyTorchLightning/lightning-transformers/issues/236 is the same issue with me!
How to fix the error
I think there are several solutions for fixing bug. To fix the bug, i converted the DictConfig to DataConfig when creating DataModule
I changed the code from
to
If there are better solutions, please write a comment in this issue!