CUNY-CL / yoyodyne

Small-vocabulary sequence-to-sequence generation with optional feature conditioning
Apache License 2.0
25 stars 15 forks source link

Yoyodyne 🪀

PyPI
version Supported Python
versions CircleCI

Yoyodyne provides neural models for small-vocabulary sequence-to-sequence generation with and without feature conditioning.

These models are implemented using PyTorch and Lightning.

While we provide classic LSTM and transformer models, some of the provided models are particularly well-suited for problems where the source-target alignments are roughly monotonic (e.g., transducer and hard_attention_lstm) and/or where source and target vocabularies have substantial overlap (e.g., pointer_generator_lstm).

Philosophy

Yoyodyne is inspired by FairSeq (Ott et al. 2019) but differs on several key points of design:

Authors

Yoyodyne was created by Adam Wiemerslage, Kyle Gorman, Travis Bartley, and other contributors like yourself.

Installation

Local installation

Yoyodyne currently supports Python 3.9 and 3.10. #60 is a known blocker to Python > 3.10 support.

First install dependencies:

pip install -r requirements.txt

Then install:

pip install .

It can then be imported like a regular Python module:

import yoyodyne

Google Colab

Yoyodyne is compatible with Google Colab GPU runtimes. This notebook provides a worked example. Colab also provides access to TPU runtimes, but this is not yet compatible with Yoyodyne to our knowledge.

Usage

Training

Training is performed by the yoyodyne-train script. One must specify the following required arguments:

The user can also specify various optional training and architectural arguments. See below or run yoyodyne-train --help for more information.

Validation

Validation is run at intervals requested by the user. See --val_check_interval and --check_val_every_n_epoch here. Additional evaluation metrics can also be requested with --eval_metric. For example

yoyodyne-train --eval_metric ser ...

will additionally compute symbol error rate (SER) each time validation is performed. Additional metrics can be added to evaluators.py.

Prediction

Prediction is performed by the yoyodyne-predict script. One must specify the following required arguments:

The --predict file can either be a TSV file or an ordinary TXT file with one source string per line; in the latter case, specify --target_col 0. Run yoyodyne-predict --help for more information.

Data format

The default data format is a two-column TSV file in which the first column is the source string and the second the target string.

source   target

To enable the use of a feature column, one specifies a (non-zero) argument to --features_col. For instance in the SIGMORPHON 2017 shared task, the first column is the source (a lemma), the second is the target (the inflection), and the third contains semi-colon delimited feature strings:

source   target    feat1;feat2;...

this format is specified by --features_col 3.

Alternatively, for the SIGMORPHON 2016 shared task data:

source   feat1,feat2,...    target

this format is specified by --features_col 2 --features_sep , --target_col 3.

In order to ensure that targets are ignored during prediction, one can specify --target_col 0.

Reserved symbols

Yoyodyne reserves symbols of the form <...> for internal use. Feature-conditioned models also use [...] to avoid clashes between feature symbols and source and target symbols. Therefore, users should not provide any symbols of the form <...> or [...].

Model checkpointing

Checkpointing is handled by Lightning. The path for model information, including checkpoints, is specified by a combination of --model_dir and --experiment, such that we build the path model_dir/experiment/version_n, where each run of an experiment with the same model_dir and experiment is namespaced with a new version number. A version stores everything needed to reload the model, including the hyperparameters (model_dir/experiment_name/version_n/hparams.yaml) and the checkpoints directory (model_dir/experiment_name/version_n/checkpoints).

By default, each run initializes a new model from scratch, unless the --train_from argument is specified. To continue training from a specific checkpoint, the full path to the checkpoint should be specified with for the --train_from argument. This creates a new version, but starts training from the provided model checkpoint.

By default 1 checkpoint is saved. To save more than one checkpoint, use the --num_checkpoints flag. To save a checkpoint every epoch, set --num_checkpoints -1. By default, the checkpoints saved are those which maximize validation accuracy. To instead select checkpoints which minimize validation loss, set --checkpoint_metric loss.

Models

The user specifies the overall architecture for the model using the --arch flag. The value of this flag specifies the decoder's architecture and whether or not an attention mechanism is present. This flag also specifies a default architecture for the encoder(s), but it is possible to override this with additional flags. Supported values for --arch are:

The user can override the default encoder architectures. One can override the source encoder using the --source_encoder flag:

When using features, the user can also specify a non-default features encoder using the --features_encoder flag (linear, lstm, transformer).

For all models, the user may also wish to specify:

By default, LSTM encoders are bidirectional. One can disable this with the --no_bidirectional flag.

Training options

A non-exhaustive list includes:

Additional training options are discussed below.

Early stopping

To enable early stopping, use the --patience and --patience_metric flags. Early stopping occurs after --patience epochs with no improvement (when validation loss stops decreasing if --patience_metric loss, or when validation accuracy stops increasing if --patience_metric accuracy). Early stopping is not enabled by default.

Schedulers

By default, Yoyodyne uses a constant learning rate during training, but best practice is to gradually decreasing learning rate as the model approaches convergence using a scheduler. Three (non-null) schedulers are supported and are selected with --scheduler:

Simulating large batches

At times one may wish to train with a larger batch size than will fit in "in core". For example, suppose one wishes to fit with a batch size of 4,096, but this gives an out of memory exception. Then, with minimal overhead, one could simulate an effective batch size of 4,096 by using batches of size 1,024, accumulating gradients from 4 batches per update:

yoyodyne-train --batch_size 1024 --accumulate_grad_batches 4 ...

Automatic tuning

yododyne-train --auto_lr_find uses a heuristic (Smith 2017) to propose an initial learning rate. Batch auto-scaling is not supported.

Hyperparameter tuning

No neural model should be deployed without proper hyperparameter tuning. However, the default options give a reasonable initial settings for an attentive biLSTM. For transformer-based architectures, experiment with multiple encoder and decoder layers, much larger batches, and the warmup-plus-inverse square root decay scheduler.

Weights & Biases tuning

wandb_sweeps shows how to use Weights & Biases to run hyperparameter sweeps.

Accelerators

Hardware accelerators can be used during training or prediction. In addition to CPU (the default) and GPU (--accelerator gpu), other accelerators may also be supported but not all have been tested yet.

Precision

By default, training uses 32-bit precision. However, the --precision flag allows the user to perform training with half precision (16) or with the bfloat16 half precision format if supported by the accelerator. This may reduce the size of the model and batches in memory, allowing one to use larger batches. Note that only default precision is expected to work with CPU training.

Examples

The examples directory contains interesting examples, including:

For developers

Developers, developers, developers! - Steve Ballmer

This section contains instructions for the Yoyodyne maintainers.

Releasing

  1. Create a new branch. E.g., if you want to call this branch "release": git checkout -b release
  2. Sync your fork's branch to the upstream master branch. E.g., if the upstream remote is called "upstream": git pull upstream master
  3. Increment the version field in pyproject.toml.
  4. Stage your changes: git add pyproject.toml.
  5. Commit your changes: git commit -m "your commit message here"
  6. Push your changes. E.g., if your branch is called "release": git push origin release
  7. Submit a PR for your release and wait for it to be merged into master.
  8. Tag the master branch's last commit. The tag should begin with v; e.g., if the new version is 3.1.4, the tag should be v3.1.4. This can be done:
    • on GitHub itself: click the "Releases" or "Create a new release" link on the right-hand side of the Yoyodyne GitHub page) and follow the dialogues.
    • from the command-line using git tag.
  9. Build the new release: python -m build
  10. Upload the result to PyPI: twine upload dist/*

References

Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., Grangier, D., and Auli, M. 2019. fairseq: a fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 48-53.

Smith, L. N. 2017. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision, pages 464-472.