INTRUS

Supplementary code for NeurIPS submission "Sequence Modeling with Unconstrained Generation Order"(arxiv). This code trains and applies a machine translation model that can generate sequences in arbitrary order

orders

What do i need to run it?

A machine with some CPU (preferably 4+) and at least one GPU
The optimal performance is reached when running on 8 GPUs
Some popular Linux x64 distribution
- Tested on Ubuntu16.04, should work fine on any popular linux64 and even MacOS;
- Windows and x32 systems may require heavy wizardry to run;
- When in doubt, use Docker, preferably GPU-enabled (i.e. nvidia-docker)

How do I run it?

Setup environment
- Clone or download this repo. cd yourself to it's root directory.
- Get a python distribution. Anaconda works fine.
- Install packages from requirements.txt
Prepare data
- Grab the WMT English-Russian dataset from http://statmt.org/ (or another language of your choosing)
- Tokenize it with mosestokenizer or any other reasonable tokenizer. It is also recommended that you lowercase the data.
- Learn and apply BPE with subword-nmt
- You can find example preprocessing pipelines here.
Run jupyter notebook
- All the training notebooks are in the ./notebooks/ folder
- Before you run the first cell, optionally set %env CUDA_VISIBLE_DEVICES=### to devices that you plan to use.
- Follow the code as it loads data, trains model and reports training progress.
- NOTE: The BLEU metric measured in the notebook is not the one used for evaluation. See sacrebleu.

TIXFeniks / neurips2019_intrus

readme

INTRUS

What do i need to run it?

How do I run it?