Our current checkpoint callback is not designed to support tensorflow I/O nor does it support pytorch-lightning debugging. We should consider improving it. The callback also doesn't generalize the monitoring metrics and monitoring operators (e.g. accuracy should be maximizing whereas loss should be minimized).
Optimization
We don't provide any configuration regarding optimization. Choosing an optimizer and scheduler with hyper parameters should be a great feature.
Modular Trainer
Current TransformerTrainer takes already built components (a module and datamodule). This is quite flexible to some extent but still can be improved. Building other components is not unified and every new setting imposes copy-pasting make_ functions to setup and parse arguments and them to build functional components.
With this said, I would propose a unified components builder with dependency injection to provide building flexibility and types substitution all across the system.
Quick Summary
This PR mainly improves the training pipeline.
DatasetIterator
class for supporting uniform batching in a distributed modeLightningDataModule
subclass (seeTransformerDataModule
)Training Examples
Training Script
To run training with a script you need to specify the following files:
--train_dataset_prefix
is required)A config file contains the model and config class names, config init parameters + tokenizer class and init parameters.
Training with Code
You can run the training pipeline in code. Follow these steps:
RobertaForMaskedLM
)TransformerTrainer
) with datamodule and module from step 4train
method with args from pytorch-light trainerFurther Improvements
Dataset Iterators
We should consider adding new dataset iterators for handling large datasets.
Consider looking at fairseq iterators and infinibatch iterators.
Save Checkpoint Callback
Our current checkpoint callback is not designed to support tensorflow I/O nor does it support pytorch-lightning debugging. We should consider improving it. The callback also doesn't generalize the monitoring metrics and monitoring operators (e.g. accuracy should be maximizing whereas loss should be minimized).
Optimization
We don't provide any configuration regarding optimization. Choosing an optimizer and scheduler with hyper parameters should be a great feature.
Modular Trainer
Current
TransformerTrainer
takes already built components (a module and datamodule). This is quite flexible to some extent but still can be improved. Building other components is not unified and every new setting imposes copy-pastingmake_
functions to setup and parse arguments and them to build functional components.With this said, I would propose a unified components builder with dependency injection to provide building flexibility and types substitution all across the system.