TIXFeniks / neurips2019_intrus

MIT License
16 stars 4 forks source link

INTRUS

Supplementary code for NeurIPS submission "Sequence Modeling with Unconstrained Generation Order"(arxiv). This code trains and applies a machine translation model that can generate sequences in arbitrary order

orders

What do i need to run it?

How do I run it?

  1. Setup environment

    • Clone or download this repo. cd yourself to it's root directory.
    • Get a python distribution. Anaconda works fine.
    • Install packages from requirements.txt
  2. Prepare data

    • Grab the WMT English-Russian dataset from http://statmt.org/ (or another language of your choosing)
    • Tokenize it with mosestokenizer or any other reasonable tokenizer. It is also recommended that you lowercase the data.
    • Learn and apply BPE with subword-nmt
    • You can find example preprocessing pipelines here.
  3. Run jupyter notebook

    • All the training notebooks are in the ./notebooks/ folder
    • Before you run the first cell, optionally set %env CUDA_VISIBLE_DEVICES=### to devices that you plan to use.
    • Follow the code as it loads data, trains model and reports training progress.
    • NOTE: The BLEU metric measured in the notebook is not the one used for evaluation. See sacrebleu.